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FOREWORD 


The  increase  in  scientific  and  technical  knowledge  is  creating  a  need  for  more  efficient 
information  storage  and  retrieval.  These  increases  have  in  turn  forced  us  to  expand  our 
search  for  more  knowledge  compounding  the  problem.  Present  manual  and  mechanical  methods 
of  storing  and  retrieving  information  are  slow  and  cumbersome  resulting  in  increasing 
interest  in  on-line  interaction  between  computers  and  the  data  environment  for  information 
processing,  retrieval  and  transfer. 

Significant  changes  are  beginning  to  occur  in  ideas  on  the  way  man  uses  information. 
Involved  in  these  changes  is  the  concept  of  the  computer  as  an  interactive  medium.  The 
concept  of  using  a  compi'ter  for  routine  clerical  library  jobs  is  being  extended  to  include 
the  adaptive,  real-time,  dynamic  functions  of  modern  computing  machinery  for  storing  and 
retrieving  information.  In  experimental  studies  rapid  memorizing  and  recall  capa’^ilities 
are  being  extended  to  vast  quantities  of  information  available  in  printed  documents.  With 
the  prospect  of  automatic  indexing  and  subsequent  file  search,  a  user  may  eventually  be 
able  to  use  these  facilities  to  retrieve  information  from  huge  repositories,  paralleling 
recall  of  information  from  his  own  memory.  Effective  harnessing  of  the  lumput  .  power 
promises  a  revolution  in  the  very  nature  of  iuture  libraries  and  information  systems. 

Current  limitations  in  storage  and  retrieval  are  largely  the  result  of  limited  memory 
capacity.  One  has  to  be  satisfied  with  direct  on-line  inter-action  with  bibliographic 
information,  followed  by  the  time-honoured  process  of  reading  and  manually  sorting  out  the 
essential  contents  from  many  documents,  a  good  number  of  which  contain  little,  if  any, 
really  appropriate  information.  One  can  not  yet  expect  to  get  specific  answers  directly 
to  specific  questions.  Present  day  interactive  bibliographic  searches  are  effective  in 
locating  bulk  substantive  information,  but  are  far  from  solving  the  basic  problem  of 
information  transfer.  This  basic  problem  lies  in  the  sorting  out  and  assimilation  of  the 
desired  specific  information  from  the  mass  of  documents  that  have  been  acquired. 

Over  a  half  dozen  major  international  symposia  have  been  held  in  the  past  two  years  on 
Information  storage,  retrieval  and  dissemination.  These  have  generally  been  slanted  either 
to  the  documentalist  or  to  the  computer  operator.  Recognizing  the  need  to  fill  in  the  gap 
between  these  two  viewpoints,  and  the  need  for  engineers  and  scientists  in  NATO  member 
countries  to  widen  their  perspective  on  the  subject  of  the  processing  and  dissemination 
of  technical  <nformation,  the  Technical  Information  and  Avionics  Panels  of  A6ARD  held  a 
Symposium  in  Munich,  June  18-20,  1968  on  ‘Storage  and  Retrieval  of  Information  -  A  User- 
Supplier  Dialogue”.  The  aim  of  the  Symposium  was  to  emphasize  service  to  the  real  customer, 
the  scientist  or  engineer. 

This  volume,  the  Proceedings  of  the  Symposium,  contains  papers  and  discussions  to 
stimulate  and  enlighten  the  scientists  and  engineers  who  are  the  actual,  or  potential, 
users  of  the  systems  described. 

The  subject  is  introduced  by  presenting  the  Individual  points  of  view  of  the  user  who 
asks  for  information  and  the  supplier  who  stores  information  for  subsequent  retrieval. 
Present  operational  manual  and  mechanical  systems  are  discussed  to  establish  an  appreciation 
of  current  practice.  State-of-the-art  in  scientific  and  technical  aids  and  the  evolution 
of  current  methodology  together  with  the  user  needs  are  tqjplied  to  the  development  of 
potential  future  systems.  Having  an  appreciation  of  the  problems  of  storage  and  retrieval 
and  possible  solutions,  the  user-supplier  loop  is  closed  by  discussing  the  dialogue  which 
must  exist  between  the  two  in  order  to  obtain  a  successful  ■  perating  system. 
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AVANT-PROPOS 


Le  volume  toujours  croissant  de  nos  connaissances  scientlfiques  et  techniques  conduit 
k  la  n4cesslt4  de  disposer  de  moyens  plus  efflcaces  d’ emmagaslnage  et  de  selection 
d’ Informations,  volume  qul  ankne  k  son  tour  k  une  recherche  plus  pouss4e  de  plus  de 
connaissances,  rendant  encore  plus  complexe  le  problkme.  Les  ffl4thodes  manuelles  et 
m4canlques  employ4es  actuellement  pour  1' emmagaslnage  et  la  s4lectlon  des  donn4es  4tant 
lentes  et  lourdes,  on  s'  lnt4resse  de  plus  en  plus  k  1’  Interaction  dlrecte  entre  les 
calculateurs  et  1’ envlronnement  des  donn4es  en  vue  du  traltement,  de  la  s4lectlon  et  du 
transfert  d’  Informations. 


Les  ld4es  concernant  la  fa9on  dont  les  Informations  sont  utllls4es  par  1’  homme 
commencent  k  se  modifier  de  manlkre  Importante,  et  notamment,  la  notion  du  calculateur 
comme  moyen  d’ Interaction.  La  notion  d'utlllser  le  calculateur  pour  assurer  les  tkches 
de  nature  syst4matlque  entreprlses  dans  une  bj.bllothkque  se  volt  4tendre  aux  fonctlons 
adapt Ives  et  dynamlques  dans  le  temps  r4el  des  machines  k  calculer  modernes  destln4es  k 
1* emmagaslnage  et  k  la  s4lectlon  des  Informations.  Des  4tudes  exp4rlmentales  permettent 
d’4tendre  k  de  vastes  quant lt4s  d* Informations  consignees  dans  des  documents  Imprlmes  les 
posslbllit4s  de  mlse  en  m4molre  et  de  rappel  rapldes.  Etant  donne  la  perspective  de  moyens 
automatlques  de  classement  et  de  recherche  des  dossiers,  11  est  possible  qu'un  utlllsateur 
pulsse  en  d4flnltlve  employer  ces  moyens  pour  s4lectlonner  des  Informations  dans  de  grands 
r4pertolres,  de  la  m4me  fa^on  dont  11  se  rappelle  des  Informations  de  sa  propre  m4molre. 

Une  utilisation  efflcace  du  pouvolr  du  calculateur  fait  penser  aux  bouleversements  que 
pourralt  sablr  la  nature  m4me  des  blbllothkques  et  Oes  systkmes  d’ information  de  I'avenir. 

Les  limitations  des  systkmes -actuels  d' emmagaslnage  et  de  s4lectlon  de  donnees  sont 
dues  en  grande  partle  k  une  capaclt4  de  m4molre  llmlt4e.  II  faut  se  contenter  d' une 
Interaction  dlrecte  avec  des  Informations  blbllograpblques,  sulvle  du  proc4d4  consacr4 
par  1’ usage  que  conslste  k  lire  et  k  trier  k  main  le  contenu  essentlel  de  blen  des 
doc'iments,  dont  un  grand  nombre  comports  peu,  s’ 11  y  en  a  de  donn4es  vralment  utiles. 

On  ne  pent  pas  encore  esp4rer  obtenlr  des  r4ponses  sp4clflques  et  lmm4diates  k  des  questions 
sp4clflques.  Les  recberches  blbllogrwhlques  Interactlves  que  I’on  fait  actuellement 
permettem:  de  locallser  des  renselgnements  "en  vrac”,  mals  sont  loins  d’etre  capables  de 
r4soudre  le  problems  fondamental  du  transfert  d’  Informations,  problems  qul  r4slde  dans  le 
triage  et  1'  assimilation  des  donn4es  partlcullkres  voulues,  k  partlr  de  la  vaste  quant lt4 
de  documents  ayant  4t4  acquis. 


Plus  de  six  grands  colloquos  Internal lonaux  ont  4t4  consacres  au  cours  de  ces  deux 
dernlkres  ann4es  k  la  questlou  de  1’ emmagaslnage,  de  la  s4lectlon  et  de  la  diffusion 
d’ Informations.  Ces  colloques  ont  4t4  en  g4n4ral  destln4s  aux  documentallstes  ou  aux 
op4rateurs  d’ordlnateurs.  Compte  tenu  de  la  n4cesslt4  de  combler  la  lacune  entre  ces  deux 
points  de  vue,  et  de  la  n4cesslt4,  pour  les  lng4nleurs  et  les  savants  des  pays  membres  do 
I’OTAN,  d’4larglr  leur  vue  sur  la  question  du  traltement  et  de  la  dlffuslo..  des  donnkes 
techniques,  les  commissions  "Informations  Techniques”  et  "Avlonlque”  de  I’AGARD  ont 
organls4  un  Symposium  qul  s’ est  tenu  du  18  au  20  Juln  1968  k  Munich  et  qul  a  eu  pour  thkme 
••L’ Qumagaslnage  et  la  s4lectloa  des  Donn4es  -  Dialogue  entre  Utlllsateur  et  Fournlsseur”. 
Le  colloque  avalt  pour  but  de  soullgner  les  services  k  rendre  au  client  r4el,  c’est-k-dlre 
le  savant  ou  l’eng4nleur. 


Le  pr4sent  document,  constltu4  par  le  Procks-Verbal  de  ce  Symposium,  comports  des 
expos4s  et  des  discussions  destln4s  k  encourager  et  k  4clalrclr  les  rechercheurs  et  les 
lng4nleurs  qul  s-int  les  utlllsateurs  r4els  ou  potentlels  des  systkmes  y  d4crlts. 
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Conime  point  de  depart,  on  pr4sente  les  points  de  vue  individuels  de  I'utilisateur 
qui  demande  deb  informations  et  du  fournisseur  qul  fait  emmagasiner  des  informations  en 
vue  de  leur  s4lection  ult4rieure.  Les  syst^mes  manuels  ou  m4caniques  actuellement  mis 
en  oeuvre  sont  examines  pour  4tablir  une  appreciation  des  usages  courants.  L’4tat  actuel 
des  connaissances  dans  le  domaine  des  aides  sclentifiques  et  techniques,  et  1’  evolution 
de  la  methodologie  actuelle,  ainsi  que  les  besoins  de  1’ utilisateur,  sont  appliques  au 
developpement  de  systemes  futurs  possibles.  Aprbs  avoir  etabli  une  appreciation  des 
probl^mes  que  posent  1' emmagasinage  et  la  selection  des  donnees,  et  des  solutions  pouvant 
Stre  trouvees,  on  fait  la  boucle  utilisateur  -  fournisseur  en  evoquant  le  dialogue  devant 
exlster  entre  les  deux  si  I’on  veut  realiser  un  systeme  operationnel  reussi. 
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OPENING  REMARKS 


by  Director  Finn  Lied 
Director  of  AGARD 


It  gives  me  a  great  deal  of  pleasure  to  extend  to  you  all  a  cordial  welcome  to  this 
symposium  on  the  storage  and  retrieval  of  Information.  I  am  very  pleased  to  see  that  we 
have  such  a  good  representative  of  scientists  and  engineers,  that  is  the  users  and 
producers  of  information,  as  well  as  documentalists  and  computer  scientists.  I  should 
like  also  to  welcome  the  considerable  number  of  visitors  and  hope  that  they  will  find  this 
Introduction  to  the  work  of  AGARD  both  profitable  and  enjoyable. 

I  wish  to  express  our  grateful  thanks  to  the  Federal  German  Government  for  its  invitation 
to  meet  in  this  ancient  city  of  Munich  and  for  the  ample  facilities  that  have  been  provided 
so  that  you  who  are  participating  in  the  symposium  will  not  only  stimulate  your  professional 
interests  in  comfort  but  will  also  be  able  to  enjoy  the  cultural  and  popular  features  of 
the  historic  capital  of  Bavaria  and  the  magnificent  scenery  of  the  district. 

The  symposium  programme  is  a  joint  effort  by  the  two  Panels  on  Avionics  and  Technical 
Information  which  are  respectively  responsible  for  the  computer  and  documentation  papers. 
With  two  completely  different  subjects  such  as  these  it  is  vitally  important  that  there 
shall  be  full  understanding  between  the  two  sets  of  practitioners  so  that  what  the  one  can 
supply  agrees  as  nearly  as  possible  with  what  the  other  needs. 

There  is,  however,  another  division,  between  the  scientists  and  engineers  who  produce 
the  Information  and  who  are  also  the  users  of  it  and  those  who  store  and  retrieve  it. 

Here  again  unless  what  the  one  supplies  corresponds  closely  with  what  the  other  needs  there 
is  a  grave  danger  of  duplication  of  effort.  Thus  the  theme  of  the  symposium  is  "A  User- 
Supplier  Dialogue”  and  I  hope  that  NATO  Scientists  and  Engineers  who  use  and  produce 
information  will  be  stimulated  to  express  their  views  and  elaborate  their  needs. 

AGARD  Panels  cover  a  very  wide  field  and  each  Panel  consists  of  specialists  in  that 
particular  field.  Most  new  developments  need  contributions  from  several  fields  and  it  is 
here  that  AGARD  can  make  its  most  productive  cortrlbution  by  bringing  together  in  joint 
meetings  or  symposia  the  specialists  of  the  several  fields.  Here  we  have  the  specialists 
in  documentation  and  computers  talking  to  those  working  in  other  fields  and  I  am  sure  that 
all  will  benefit. 
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WELCOMING  REMARKS 


by  Or  Th.Benecke 

President  of  the  Bundesant  fur  Wehrtechnik  iind  Beschaffung  in  Kcblenz 
and  National  Dele__,3  to  A6ARD 

Ladies  and  Gentlemen, 

I  have  the  honour  and  the  pleasure  to  welcome  you  on  behalf  of  the  Minister  of  Defence 
and  in  my  edacity  as  National  Delegate  to  A6ARD. 

We  would  like  to  thank  you  for  having  accepted  our  Invitation  to  hold  the  joint  meeting 
of  the  AGARD  Avionics  and  Technical  Information  Panels  on  "Storage  and  Retrievr.1  of 
Information”  here  in  Munich. 

As  done  on  previous  occasions  by  the  "Wissenscha'tUche  Gesellschaft  fur  Luft-  und 
Raumfahrt  (WGLR),  the  meeting  has  been  organised  th*.s  time  by  the  new  "Deutsche  Gesellschaft 
ftir  Luft-  und  Raumfahrt”  (DGLR)  which  has  been  set  up,  combining  the  WGLR  and  the  "Deutsche 
Gesellschaft  fiir  Raketentechnik  und  Raumfahrt”  (D6RR).  I  should  also  like  to  welcome 
you  on  behalf  of  this  association. 

The  central  location  of  the  Kiinstlerhaus,  in  which  this  meeting  is  taking  place,  offers 
you  a  good  opportunity  to  visit,  in  addition  to  your  programme,  the  attractions  of  Munich 
and  to  appreciate  the  beauty  of  the  city.  I  would  like  to  mention  here  the  reception  to 
be  given  by  the  Mayor  of  Munich  at  the  Town  Hall  and  to  which  you  are  all  cordially  invited. 

Your  meeting  is  devoted  to  a  very  important  topic.  Ever  increasing  scientific  literature 
has  made  the  handling  of  documentation  a  prominent  problem,  and  we  can  only  hope  f’at 
modern  technical  equipment  and  procedures  will  help  us  to  master  this  problem  in  the 
future.  No  doubt  the  planned  presentations  and  the  discussions  will  help  to  define 
arising  difficulties  and  to  find  ways  and  means  leading  to  practical  Ljlut'jns. 

I  wish  you  a  successful  meeting. 
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COMMUNICATION  AND  SECRECY  IN  SCIENCE 

ter 

R.  Sdirader 

Ministry  of  Defence,  Germany 
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SUMMARY 


Uie  Increasing  volume  of  material  published  Is  giving  rise  to  the  use  of 
computers  in  assembling,  processing  and  Indexing  of  research  results. 
Experience  shows  that  however  modem  the  computer  it  can  never  make  critical 
Judgement  or  Interrogation  of  the  data  it  handles.  Direct  contact  (tor 
correspondence,  discussions,  conferences)  will  continue  to  play  a  major 
role  in  comunicatlon. 

Secrecy,  whether  military  or  Industrial,  is  always  to  be  deprecated  from 
the  viewpoint  of  scientific  research  and  should  never  be  applied  to  pure 
research  and  broad  fields  of  basic  knowledge.  Secrecy  can  only  bring  short 
term  advantages. 

Finally,  recomnendatlon  made  ty  the  NATO  Science  Coimnlttee  on  exchanges 
of  scientific  information,  co-operation  in  publication  and  co-ordination  of 
documentation  centres  are  recalled. 


3 


COMMUNICATION  AND  SECRECY  IN  SCIENCE 
R.  Schrader 


1.  INTRODUCTION 

Modem  science  places  an  ever-increasing  demand  on  the  communication  of  its  results, 
and  one  of  the  most  serious  problems  of  today  is  that  of  providing  the  professional  worker 
with  speedy  access  to  the  specialized  knowledge  In  his  field  of  interest.  In  fact,  no 
research  and  development  programme  can  nowadays  be  initiated  and  carried  out  with  some 
hope  for  success,  without  having  available  good  documentation.  But  there  is  also  a  great 
danger  that  science  as  such  fragments  into  a  number  of  independent  disciplines  which  bear 
little  relationship  to  each  other,  unless  we  are  able  to  cope  with  the  "information 
explosion".  Advancement  in  the  science  and  technology  of  documencation  had  been  slow 
until  relatively  recently,  but  with  the  mechanization  and  automation  of  a  great  many  of 
its  processes,  progress  is  now  fairly  rapid.  Against  this  background,  the  AGARO  Panels 
for  Avionics  and  Technical  Information  agreed  to  jointly  organize  a  symposium  devoted  to 
the  subject  of  "Storage  and  Retrieval  of  Information".  When  the  planning  for  the  symposium 
began,  it  was  noted  that  several  conferences  on  much  the  .same  subject  had  been  held  during 
the  last  years,  and  the  question  was  raised  as  to  whether  there  was  indeed  a  need  for 
another  meeting  of  this  kind.  On  examining  the  programmes  of  these  conferences,  however, 
it  was  realized  that,  in  the  main,  they  had  focussed  attention  on  those  problems  which 
are  of  primary  interest  to  the  documental ist.  In  fact,  none  of  these  conferences  had 
offered  much  opportunity  to  the  scientist,  as  the  user  of  scientific  knowledge,  to  discuss 
his  needs  in  the  fields  of  Information  processing  and  dissemination.  For  this  reason, 
both  AGARD  Panels  eventually  reached  agreement  to  propose  a  programme  which  should  fill 
this  gap  and  should  serve  both  professional  groups  in  a  joint  meeting. 

As  one  may  see  from  the  programme,  the  symposium  intends  to  demonstrate  how  modem 
storage  and  retrieval  concepts  assist  the  documentalist  in  the  handling  of  scientific 
information,  and  scientists  in  the  audience  are  expected  to  express  their  views  on  a  number 
of  newly  proposed  designs  and  methods.  But  the  scientists  are  also  invited  to  the  meeting 
as  active  contributers.  I  know  that  in  a  number  of  countries  great  efforts  are  being  made 
to  Introduce  the  modern  techniques  of  data  handling  into  the  traditional  fields  of  docu¬ 
mentation  and  that  promising  results  are  expected  in  the  not  too  distant  future.  Including 
automatic  reading  and  language  translation,  based  on  the  application  of  modern  computers. 

Hence,  the  success  of  the  symposium  will  depend  on  the  co-operation  and  joint  partici¬ 
pation  of  both  professional  groups.  To  this  end,  invitations  for  the  symposium  have  gone 
out  to  all  AGARD  Panels  and  other  scientists  interested  in  the  subject,  and  I  note  with 
great  appreciation  how  many  of  them  have  come,  in  addition  to  the  great  number  of 
documentation  experts. 

As  I  have  been  invited  to  represent  at  this  meeting  the  scientist's  point  of  view,  I 
Intend  to  discharge  my  task  by  discussing  communication  and  secrecy  in  science.  In  so 
doing,  I  recognize  the  Important  rdle  documentation  plays  in  the  general  process  whereby 
science  advances  and  our  scientific  knowledge  grows. 


2.  EARLY  HISTORY  OF  DOCUMENTATION 

Documentation  appears  to  be  almost  as  old  as  man’s  Interest  in  science  and  literature, 
and  among  the  earliest  collections  of  manuscripts  were  archives  attached  to  temples  and 
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palaces.  The  great  philosophers  and  scientists  of  antiquity  were  active  collectors  of 
volumes,,  and  Aristotle  was  the  first  man  known  to  possess  a  collection  worthy  of  the  name 
"library”. 

For  many  centuries,  the  library  at  Alexandria  represented  the  intellectual  centre  of 
the  ancient  world.  It  is  difficult  to  give  a  precise  figure,  but  it  can  be  said  that  the 
number  of  manuscripts  collected  from  all  parts  of  the  world  and  available  in  the  library 
was  very  large;  however,  it  should  be  kept  in  mind  that  the  papyrus  roll  of  antiquity 
usually  contained  less  written  matter  than  a  modern  book.  The  catalogues  and  classified 
lists  issued  by  the  library  were  among  the  earliest  experiments  in  bibliography. 

Although  the  Romans  made  excellent  use  of  technology,  they  did  not  particularly 
encourage  scientific  advancement.  As  a  matter  of  fact,  the  Romans  were  not  scientifically 
imaginative,  whereas  they  demonstrated  outstanding  capabilities  in  such  fields  as  Juris¬ 
prudence,  political  and  military  affairs.  Science,  therefore,  remained  Greek  in  nature 
and  spirit  under  Roman  domination,  and  it  was  not  until  the  last  century  of  the  republic 
that  we  hear  of  libraries  in  the  capital.  These  libraries  ceased  to  collect  Greek  writings 
when  in  the  year  330  the  capital  was  removed  to  the  Bosphorus.  Eventually,  the  aggressions 
and  intrusions  of  the  Germanic  tribes  swept  away  the  classical  learning  from  Italian  soil 
and,  with  the  fall  of  Rome  In  476,  the  ancient  history  of  documentation  came  to  an  end. 

When  Christian  literature  began  to  grow,  libraries  became  part  of  the  ecclesiastical 
organizations.  The  abbey  of  Monte  Cassino  founded  in  Italy  about  529  was  the  first  monastery 
in  idiich  a  library  of  religious  works  was  established,  and  this  custom  rapidly  spread  to 
all  parts  of  the  world.  Monks  began  to  write,  and  the  use  of  vellum  instead  of  papyrus 
resulted  in  the  replacement  of  the  ancient  roll  by  the  bound  book  as  we  know  it  today. 

The  Renaissance  brought  about  a  gradual  broadening  of  man’ s  intellectual  horizon  and 
an  ever-growing  desire  for  the  collection  of  manuscripts  and  books  outside  the  monastries. 

To  satisfy  this  desire,  large  public  libraries  were  created  in  P-^ance  and  Italy.  When  in 
1453  Constantinople  was  conquered  by  the  Turks,  a  full  stream  of  Greek  scholars  began  to 
flow  into  Western  Europe,  and  the  impact  of  toeek  philosophy  on  Latin  Oiristianity  produced 
a  powerful  surge  of  Intellectual  activity  and  a  keen  Interest  in  science.  This  development 
created  a  high  demand  for  more  and  more  books,  and  it  is  one  of  the  miracles  of  history 
that  the  printing  press  was  Invented  almost  at  the  same  time  as  Constantinople  fell  into 
the  hands  of  the  Turks.  With  the  enormous  increase  of  the  scientific  literature,  new 
plans  for  libraries  had  to  be  developed,  and  the  modem  history  of  documentation  began. 


3.  COMMUNICATION  AND  SCIENTIFIC  RESEARCH 

Science  and  technology  can  only  grow  and  blossom  in  an  environment  which  provides  for 
free  communication  between  scientists  and  for  the  rapid  transmission  of  scientific  know¬ 
ledge  to  the  engineers  irtjo  always  strive  for  Innovations  and  continuously  wish  to  use  new 
discoveries  for  practical  purposes..  Even  in  the  fields  of  defence  research  and  development, 
progress  depends  on  the  freedom  of  exchanging  technical  information.  In  fact,  communication 
has  always  been  a  necessary  part  of  the  scientific  process,  and  there  is  today  eveiywhere 
in  the  world  a  growing  awareness  of  its  increasing  importance  for  the  advancement  of 
science.  The  more  the  body  of  scientific  knowledge  grows,  the  greater  is  the  need  for 
communication;  and  the  more  communication  is  provided,  the  better  are  scientists  able  to 
carry  out  research. 

Until  the  latter  half  of  the  17th  century,  communication  was  by  way  of  direct  corres¬ 
pondence  wherdjy  scientists  kept  each  other  Informed  about  the  work  they  were  doing.  At 
about  this  time,  however,  scientific  Journals  began  to  appear  providing  a  much  better 
method  of  communication.  In  addition,  they  provided  scientists  with  oiwortunlties  to 
proclaim  their  discoveries  to  the  world,  in  an  effort  to  gain  recognition. 


5 


Today,  the  vast  growth  of  scientific  activity  makes  it  difficult  for  the  individual 
scientist  to  keep  up  with  the  ever-increasing  nunber  cf  publications..  So  much  is  nowa¬ 
days  published  that  he  is  completely  Incapable  of  studying  all  papers  which  could  be  of 
some  help  to  him  in  his  own  field  of  research.  Ihis  situation  is  often  referred  to  as 
the  "crisis  in  connunlcation”,  and  I  share  the  hope  that  in  the  not  too  distant  future  a 
modern  computer  be  designed  idiereby  the  steadily  growing  flood  of  new  scientific  informa¬ 
tion  is  indexed,  processed  and  assembled,  in  order  to  get  us  over  all  the  difficulties  of 
this  crisis.;  But  whatever  computers  will  be  capable  of  doing  in  calling  a  scientist’s 
attention  to  a  piece  of  scientific  Information  which  might  be  critical  to  his  work,  they 
will  never  be  able  to  distinguish  qualitatively  between  a  good  paper  and  a  bad  paper, 
neither  ?ill  they  answer  those  crucial  questions  which  scientists  are  in  the  habit  of 
asking.  For  this  reason  and  many  others,  open  discussions  among  scientists  working  in  the 
same  field  -  scientists  who  correspond  regularly  with  each  other  and  meet  repeatedly  at 
conferences  -  will  continue  to  play  a  major  part  in  the  communication  process  and  hence  in 
the  advancement  of  science. 

When  in  former  times  the  number  of  scientists  was  small,  a  scientific  discovery  was 
usually  the  product  of  one  single  mind  and  emerged  at  one  particular  moment.  Today, 
scientists  are  counted  in  hundreds  and  thousands,  and  we  are  likely  to  find  that,  at  any 
one  moment,  a  good  many  of  them  are  engaged  in  the  sane  piece  of  research  and  are  about 
to  publish  much  the  same  results.  As  a  matter  of  fact,  there  are  almost  always  several 
laboratories  in  the  world  which,  in  the  pursuit  and  exploitation  of  new  scientific  ideas, 
move  along  the  same  lines  of  approach,  and  it  is  well  remembered  how  much  research  was 
done  in  parallel  during  the  Second  World  War,  when  major  areas  of  science  were  regulated 
by  demands  of  military  security  and  scientific  communication  was  almost  non-existing. 


4.;  IMPOSING  SECRECY  ON  SCIENCE 

In  war,  but  also  in  peace,  scientific  information  of  direct  military  impact  must  be 
controlled  fay  security  regulations  in  order  to  prevent  its  premature  leakage  to  an  enemy. 
Hence,  this  Information  must  be  withheld  from  the  traditional  channels  of  scientific 
communication  by  some  sort  of  classification,  and  nobody,  of  course,  would  question  the 
need  that  in  the  Interest  of  national  defence  certain  kinds  of  scientific  information 
must  be  protected.  In  fact,  military  secrecy  is  necessary  and  often  vital  to  our  survival. 
But  in  a  real  sense,  it  is  bad  for  science. 

Classified  research  is  almost  always  in  danger  of  suffering  in  quality,  as  this  kind 
of  work  is  not  exposed  to  scientific  criticism  to  the  same  extent  as  research  which  is 
done  in  an  open  laboratory.  Secrecy  prevents  the  free  discussions  which  are  so  Important 
for  scientific  progress.  Secrecy  furthermore  results  in  large  parts  of  the  scientific 
community  being  kept  in  ignorance,  at  least  for  some  time.  It  should  be  added  that 
scientists  carrying  out  research  under  condition  of  secrecy  are  less  likely  to  enjoy  the 
overwhelming  wealth  of  scientific  ideas  than  their  colleagues  working  in  a  free  and  open 
environment.  And  as  it  frequently  happens  that  a  scientist  fails  to  realize  the  signifi¬ 
cance  of  his  own  findings,  it  becomes  all  the  more  important  that  scientific  observations 
be  made  available  to  others  for  further  studies.  Indeed,  secrecy  and  great  scientific 
thoughts  cannot  thrive  together. 

Secrecy  also  plays  an  important  rOle  in  industrial  research.  It  is  difficult  to  think 
of  a  firm  which  devotes  substantial  funds  to  research  revealing  scientific  information 
gained  at  great  expense  to  its  competitors.  However,  it  has  seldom  han^ened  in  the  past 
that  the  leakage  of  scientific  knowledge  caused  a  real /loss  to  a  company.  On  the  contrary, 
the  free  release  of  basic  information  has  in  almost  all  cases  been  an  advantage  to  all 
firms  working  in  the  same  field  of  manufacture. 

Secret  scientific  Information  often  continues  to  remain  classified,  even  if  it  has 
lost  its  military  values  almost  entirely,  and  consequently  does  not  become  available  to 
scientists  as  rapidly  as  that  in  the  open  literature.  It  is  therefore  necessary  that 
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methods  be  Introduced  whereby  scientific  Information  will  not  remain  classified  long  after 
its  secrecy  has  ceased  to  have  military  significance.  In  my  view,  the  proposal  that  some 
positive  action  should  be  required  to  maintain  classification  after  a  certain  length  of 
time  merits  consideration.  Today,  action  needs  to  be  taken  in  almost  all  cases  in  order 
to  obtain  declassification,  since  in  present  security  procedures  classification  is  heavily 
favoured  over  declassification. 

The  access  to  classified  information  is  generally  regulated  by  the  criterion  of  “need- 
to-know”.  While  this  crlterJon  can  be  easily  applied  to  information  of  tactical  military 
value,  it  can  hardly  be  used  for  information  of  scientific  content,  as  nobody  in  this 
world  is  able  to  say  precisely  in  advance  which  piece  of  scientific  information  may  or  may 
not  be  of  benefit  to  a  particular  research  project.  Because  of  a  narrow  Interpretation 
of  the  need-to-know  criterion,  it  so  happened  many  times  in  the  past  that  information 
which  was  not  available  to  a  scientist  at  the  right  moment  has  made  the  difference  between 
the  success  and  the  failure  of  a  research  project..  It  is  therefore  very  important  to 
ensure  that  the  criterion  of  need-to-know  be  intelligently  used  and  not  be  allowed  to 
hamper  the  free  flow  of  scientific  information.  But  it  is  also  necessary  to  set  up  some 
form  of  special  information  service  whereby  those  scientists  who  are  entitled  to  see 
classified  information  are  aware  of  its  existence  in  order  to  avoid  costly  and  wasteful 
duplication  in  defence  research  to  as  great  an  extent  as  possible. 

Classification  practices  adopted  by  most  countries  overlook  the  fact  that  a  scientific 
discovery  made  by  one  scientist  can  never  be  kept  secret  for  any  length  of  time,  as  one 
day,  the  same  discovery  will  be  made  by  another  scientist..  Whenever  this  happened  in  the 
past  to  a  piece  of  scientific  information  which  for  security  reasons  was  classified  and 
consequently  not  published  in  the  usual  way,  it  was  in  almost  all  cases  the  popular  belief 
that  the  Information  had  been  stolen  by  espionage.  In  reality,  however,  the  information 
was  rediscovered,  as  nature  and  her  laws  are  open  to  every  intelligent  mind  throughout  the 
entire  world,  and  no  nation  may  claim  a  monopoly  to  be  the  only  one  with  a  capability  of 
producing  scientific  ideas. 

What  do  we  gain  by  imposing  secrecy  on  scientific  research?  Obviously,  the  main  prize 
is  time;  and  this  is  probably  all  we  ever  gain  in  any  scientific  field,  as  we  may  expect 
to  prolong  the  time  it  takes  our  potential  enemies  or  competitors  to  learn  what  we  already 
know. 

In  fact,  secrecy  does  not  play  a  useful  part  in  science,  and  security  regulations  should 
never  be  applied  to  pure  research  and  broad  fields  of  basic  knowledge.  Although  the  with¬ 
holding  of  fundamental  scientific  information  may  occasionally  provide  short-term  military 
advantages,  in  general  it  is  detrimental  to  scientific  progress  and  for  this  reason  should 
always  be  avoided. 

5.  NATO’S  INTEREST  IN  SCIENTIFIC  COMMUNICATION 

Within  the  Atlantic  Alliance,  the  AGARD  Technical  Information  Panel  plays  quite  an 
exceptional  part  in  the  sense  that  it  is  the  only  body  of  NATO  which  continuously  deals 
with  documentation  and  its  various  problems.  Established  by  A6ARD  in  1953,  the  Panel  has 
launched  a  broad  programme  in  the  fields  of  aeronautical  publications,  but  has  also  served 
NATO  in  other  fields  of  documentation,  whatever  called  upon. 

At  the  request  of  the  NATO  Science  Committee,  the  Technical  Infonnation  Panel  undertook 
in  late  1958  to  study  ways  and  means  whereby  the  exchange  of  scientific  information  within 
the  Atlantic  Alliance  could  be  improved.  As  a  result  of  these  studies,  the  Panel  recommended 
in  March  1959  that  a  documentation  liaison  unit  be  established,  which  should  not  perform 
functions  usually  provided  by  an  efficient  documentation  centre  but  ^ould  keep  in  touch 
with  research  and  development  activities  and  supply  information  to  those  scientists  in 
the  NATO  countries  who  are  in  need  of  this  information.  Initially,  this  recommendation 
met  with  great  enthusiasm,  but  was  later  on  considered  difficult  to  implement,  as  a 
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docunentation  liaison  unit  of  the  kind  envisaged  by  the  AGARD  Panel  would  be  of  little 
use  in  the  case  of  classified  information  which  by  necessity  must  be  exchanged  between 
nations  on  the  basis  of  bilateral  agreements.  Consequently,  the  idea  of  a  documentation 
liaison  unit  was  dropped,  with  the  understanding  however  tliat  the  subject  of  improved 
information  exchange  among  the  NATO  countries  be  studied  along  other  lines. 

In  1962,  the  Technical  Information  Panel  submitted  to  the  Science  Conaittee  a  report 
which  dealt  with  the  exchange  of  scientific  information  in  the  defence  field.  This  report 
pointed  to  the  need  that  there  should  be  at  least  one  defence  documentation  centre  in  each 
member  country  of  NATO  and  that  countries  without  such  centres  should,  as  a  matter  of 
urgency.  Initiate  their  establishment.  All  defence  documentation  centres  should  be  equipped 
with  the  most  efficient  information-handling  techniques  and  should  endeavour  to  arrange 
the  automatic  release  of  all  unclassified  Information  to  the  corresponding  centres  of  the 
other  countries.  As  far  as  the  exchange  of  classified  scientific  information  is  concerned, 
the  report  recommended  that  countries  having  mutual  interests  in  a  given  field  make 
bilateral  arrangements.  But  pending  further  improvements  and  as  an  Initial  step,  lists 
of  unclassified  titles  of  classified  information  should  be  given  widespread  distribution. 

The  report  of  the  Technical  Information  Panel  was  approved  by  the  North  Atlantic  Council 
in  June  of  1963,  and  the  governments  of  the  member  countries  were  subsequently  invited  to 
take  such  action  as  they  deemed  necessary  for  the  implementation  of  the  reconnendatlons. 
Recognizing  the  significance  of  the  issue,  the  Council  furthermore  agreed  that  work  in  the 
field  of  docunentation  should  continue,  as  indeed  the  communication  of  scientific  knowledge 
within  the  Atlantic  Alliance  is  a  problem  area  of  greatest  importance  to  NATO. 

In  this  discussion,  I  have  referred  twice  to  the  NATO  Science  Conaittee  and  its  interest 
in  documentation.  Now,  I  should  like  to  call  your  attention  to  the  Committee' s  report  on 
“Increasing  the  Effectiveness  of  Western  Science",  issued  in  the  autumn  of  1960.  Actually, 
the  report  was  written  by  a  special  study  group  set  up  by  the  Committee,  but  it  reflects 
the  Committee’s  attitude  towards  the  need  for  accelerated  scientific  progress  in  the  Western 
world  by  means  of  enchanced  International  co-operation.  As  one  may  expect,  the  report 
deals  with  documentation,  and  it  appears  to  me  to  be  of  some  significance  in  this  context 
to  mention  briefly  the  main  recommendations,  as  they  describe  the  demands  for  Improved 
methods  in  publication  and  documentation  in  an  appropriate  way. 

The  recommendations,  as  they  are  listed  in  the  Committee’ s  report,  call  for  closer 
co-operation  in  publication  practices  between  the  chief  editors  of  the  main  scientific 
journals;  the  proliferation  of  scientific  journals  should  be  discouraged;  the  activities 
of  all  documentation  centres  should  be  co-ordinated,  and  a  single  international  system  of 
Indexing  should  be  Introduced;  authors  of  scientific  publications  should  be  invited  to 
supply,  together  with  their  papers,  abstracts  which  are  to  be  edited  accoraing  to  specific 
rules  and  to  be  classified  Ir.  accordance  with  a  single  and  unified  system;  the  abstracts 
should  be  published  immediately;  experienced  scientists  should  be  encouraged  to  periodically 
review  broad  fields  of  scientific  research  and  to  summarize  their  results  in  an  efficient 
way. 

These  are  the  main  recommendations  made  by  the  NATO  Science  Comnittee  when  dealing  with 
the  problems  of  documentation  some  years  ago.  They  demonstrate  the  importance  the  Ccmmlttee 
attaches  to  these  problems.  Needless  to  say,  these  recommendations  are  still  as  valid 
today  as  at  the  time  they  were  written. 


6.  SCIENCE  AND  DOCUMENTATION 

The  critical  review  of  wide  research  fields  periodically  done  by  competent  scientists 
serves  a  valuable  purpose  in  summarizing  large  portions  of  the  available  scientific 
literature.  In  my  view,  this  kind  of  work  should  be  given  more  credit  than  in  the  past. 
If  wisely  done,  it  will  certainly  assist  us  in  our  desire  to  cope  better  with  the  ever- 
increasing  volume  of  scientific  publications. 
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As  the  selection  and  presentation  of  scientific  Infomatlon  can  only  be  carried  out 
intelligently  by  those  who  originate  the  knowledge,  scientists  must  nowadays  devote 
greater  efforts  to  these  activities  than  before..  As  a  matter  of  fact,  research  can  no 
longer  be  regarded  as  being  completely  separated  from  the  comnunlcatlon  of  its  results, 
and  scientists  must  nowadays  Join  the  professional  documental ists  and  accept  responsibili¬ 
ties  for  the  transmission  of  scientific  information  to  the  same  extent  to  which  they  bear 
responsibility  for  research. 

Scientists  often  produce  a  certain  amount  of  redundancy  when  they  publish  their  work, 
and  it  frequently  happens  that  they  issue  for  the  same  bit  of  research  several  reports  all 
of  idiich  seem  Identical  to  the  documentallst.  There  are  reasons  for  this,  one  being  that 
this  is  the  way  whereby  scientists  make  their  work  known  and  consequently  stand  a  better 
chance  of  gaining  prestige  and  stature  in  the  world  of  science.  The  essential  point  about 
publication  is  that  scientists  should  always  be  conscious  of  the  right  to  publish,  and,  in 
my  view,  they  should  take  full  advantage  of  this  right,  as  they  have  indeed  something  to 
say  to  the  world  worth  consideration,  whether  it  is  of  pure  character  in  the  sense  that  it 
does  not  relate  as  yet  to  some  known  field  of  exploitation  or  whether  it  is  of  a  more 
practical  nature  in  a  given  area  where  the  possibilities  of  material  application  are 
already  well  recognized.  Nevertheless,  the  subjection  to  a  kind  of  self-control  practiced 
by  the  authors  of  scientific  reports  will  certainly  help  the  documental  ists  to  overcome 
at  least  some  of  their  difficulties. 

There  is  another  point  which  is  critical  for  a  fruitful  co-operation  between  the  scientist 
and  the  professional  docunentalist.  How  should  our  young  scientists  be  trained  to  make 
better  use  of  existing  documentation  facilities?  Wien  I  was  a  student  years  ago,  very 
little  was  provided  in  documentation  training,  and  we  first  became  aware  of  the  existence 
of  and  acquainted  with  the  many  problems  involved  when  we  started  to  do  laboratory  work  on 
our  own.  As  by  that  time  we  did  not  know  how  to  handle  documentatior  at  all,  each  of  us 
began  to  develop  his  own  method,  so  to  speak,  and  I  believe  that  the  lack  of  proper  guid¬ 
ance  in  this  field  is  often  the  reason  why  so  many  scientists  today  prefer  to  organise 
their  own  documentation  rather  than  to  rely  on  the  official  information  services.  There 
is  certainly  a  great  need  for  Improvement,  and  I  sincerely  hope  that  in  the  future  universi¬ 
ties  will  offer  their  students  better  training  in  documentation. 

In  conclusion,  I  should  like  to  say  that  in  my  judgement  the  key  to  the  solutions  of 
the  great  many  documentation  problems  lies  in  some  kind  of  co-operative  approach,  involving 
the  symbiotic  activities  of  scientists  and  documentallsts.  Both  professional  groups  must 
work  together  in  an  atmosphere  of  mutual  respect  and  appreciation,  and  it  seems  to  be  very 
important  to  enhance  their  Interplay.  I  trust,  this  symposium  will  contribute  to  this 
co-operation  and  will  set  an  example  for  many  more  meetings  of  this  type  to  be  held  in  the 
years  to  come. 
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DISCUSSION 


R. C.  Wright;  Do  you  think  that  in  seven-and-a  half  years  sufficient  progress  has  been  made 
with  the  riATO  Science  Committee's  recoounendation  on  co-ordination  of  documentation  centres 
and  introduction  of  an  international  system  of  indexing? 

B. Schrader;.  No;  although  the  report  was  submitted  to  the  NATO  council  it  has  never 
received  official  standing  in  the  NATO  community.  A  recently  prepared  evaluation  report 
by  tea  eminent  scientists  may  be  more  effective. 

S. C. Schuler;  Does  TIP  activity  include  preparation  of  guidelines  for  authors  on  effective 
titling  and  abstracting  of  reports? 

H. F. Vessey;  Answering  as  Chairman  of  TIP,  AGARD  Specification  1  (issued  1956  and  revised 
1968)  which  is  sponsored  by  the  Panel,  stresses  the  importance  of  these  points. 

D. Bosman;.  Does  the  education  of  scientists  and  technologists  suffer  as  a  result  of  security 
classification  of  reports? 

N.  N.Tanyol49;  Post-graduate  students  doing  research  work  suffer  from  not  having  all 
possible  information  available  to  them. 

K.  G.SchJetne;  Is  TIP  or  NATO  Science  Committee  giving  any  consideration  to  what  instruction 
in  documentation  university  students  should  receive? 

H.F. Vessey;  TIP  have  not  had  discussions  specifically  in  this  matter.  It  is  probably 
best  for  each  country  to  devise  methods  of  introducing  instruction  on  documentation  into 
courses,  distinguishing  between  instruction  in  the  use  of  documentation  for  students  in 
general  and  the  more  detailed  Instruction  for  those  who  are  intending  to  work  in  documenta¬ 
tion  centres.  In  this  way  much  is  already  being  done. 

W.  Spiess;  In  the  absence  of  the  use  of  a  common  indexing  system  in  the  NATO  community, 
could  the  AGARD  publish  a  memorandum  listing  the  systems  in  use  in  various  member  countries 
and  perhaps  giving  cross  references  between  one  system  and  another? 

H.F. Vessey;  This  is  a  very  difficult  task  and  although  efforts  have  been  made  to  accomplish 
this  over  the  years  it  has  not  yet  proved  possible.  The  LEX  Thesaurus  might  be  acceptable 
as  a  common  indexing  system. 

D.  Bosman;  How  does  one  guard  security  classified  information  stored  in  an  automatic  data 
processing  system? 

L.  Feidelman;  One  answer  is  to  Isolate  the  data  processing  facility  in  a  room  accessible 
only  to  those  with  a  security  clearance.;  Classified  documents  can  be  indexed  from  their 
unclassified  titles. 

R.D.Kerr-Waller;  Another  solution  is  to  have  a  small  computer  for  the  classified  data 
only  with  access  to  the  main  computer.  Use  of  codes  to  Identify  classified  information 
in  the  general  collection  is  useless  because  the  codes  can  always  be  broken. 
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SUMMARY 


The  tasks,  problems  and  equipment  available  to  the  documentalist  or 
information  officer  are  outlined.  The  types  of  information  agencies; 
libraries,  documentation  centres.  Information  analysis  centres  and  referral 
centres,  and  the  services  each  provides,  are  briefly  described. 

The  economic  need  to  link  information  retrieval  by  computer  with  other 
activities  such  as  preparation  of  abstract  journals,  indexes,  SDI,  is 
stressed. 

Future  developments  in  photographic  methods  of  storing  text,  pre¬ 
paration  of  reports  on  tape,  machine  reading  of  texts,  remote  display, 
rapid  printing  techniques  and  education  of  documentation  service  users 
are  discussed. 
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THE  SUPPLIER’S  POINT  OF  VIEW  -  INTRODUCTORY  PAPER 

H.P.Vessey 


1.  GENERAL 

The  field  of  information  storage  and  retrieval  requires  definition,  for  it  can  extend 
from  battlefield  surveillance,  to  the  booaing  of  seats  by  an  airline.  The  theme  of  the 
pai,ers  to  be  delivered  at  this  symposium  is  mainly  documentary,  that  is  the  storage  an.' 
retrieval  of  documents,  such  as  reports,  which  contain  the  information  requested.  However 
data,  such  as  physical  constants  or  properties  of  materials  will  also  be  Included. 

As  Dr  Schrader  has  said  there  have  been  a  number  of  International  symposia  on  this 
subject  but  the  emphasis  has  been  mainly  on  technique.  Here  it  is  hoped  that  the 
“customer"  will  take  precedence  and  that  the  Scientists  and  Engineers  will  tell  us  what 
is  required.  I  hope  too  that  the  computer  engineers  and  documentalists  will  concentrate 
on  “Service  to  the  Customer"  rather  than  lose  us  in  the  thickets  of  programming  or  the 
forests  of  indexing. 

My  paper  is  intended  to  form  a  background  to  the  specialist  papers  that  follow,  to  fill 
in  some  of  the  gaps  caused  by  the  limitation  in  numbers  of  speakers  and  finally  to  outline 
the  task,  the  problems  and  the  equipment  available  to  the  supplier,  that  is  the  Documentalist 
or  Information  Officer. 


2.  HISTORICAL 

Dr  Schrader  has  outlined  the  history  of  documentation  but  I  should  like  to  draw  your 
attention  to  the  fact  that  in  about  500  B.C.  there  was  a  library  of  10,000  documents  at 
Nineveh  which  seem  to  have  been  systematically  arranged  and  catalogued.  “Documents”  is 
perhaps  the  wrong  description  for  these  were  clay  tablets  and  must  have  presented  some 
interesting  storage  and  retriev\l  problems  where  mechanisation  might  well  have  been 
appropriate.  The  earliest  reference  to  mechanisation  the  author  has  managed  to  trace  is 
a  short  description  in  “Gullivers  Travels”  (circa  1700)  of  a  machine  to  allow  the  writing 
of  learned  theses  by  the  random  selection  of  words  and  phrases. 


3.  N.A.T.O.  DEFENCE  DOCUMENTATION  CENTRES 

The  task  of  the  National  Defence  Centre  in  a  N.A.T.O.  country  is  to  make  known  and  supply 
to  the  country’ s  scientists  and  engineers  the  unpublished  and  sometimes  published  literature 
in  their  field. 

This  is  done  in  several  ways:; 

(a)  By  supplying  to  individual  scientists  the  new  literature  in  their  fields. 

(b)  By  issuing  accession  lists,  or, 

(c)  By  preparing  Abstract  Journals  of  new  material. 

(d)  By  supplying  relevant  reports  in  response  to  an  enquiry. 
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Further,  the  National  Defence  Documentation  Centre  has  the  responsibility  of  making 
known,  in  other  countries,  the  work  of  its  own  scientists. 

The  broad  lines  of  the  operations  are  as  follows:' 

3.1  Acquisition 

The  centre  receives  from  National  Establishments.  Universities  and  from  abroad  a  wide 
collection  of  reports  and  in  addition  may  scan  published  material. 

3. 2  Recording,  Abstracting  and  Indexing 

These  are  processes  essential  to  any  large  organisation  in  order  to  allow  of  retrieval. 

A  growing  number  of  reports,  in  line  with  A.G.A.R.D.  recommendations,  now  contain  author 
abstracts  and  even  if  these  are  not  used  directly  they  now  require  little  editing.  Index¬ 
ing  to  allow  retrieval,  is  a  problem  which  will  be  discussed  later  (paragraph  7). 

3.3  Translation 

Most  Centres  will  require  translation  of  some  of  the  foreign  reports  and  this  frequently 
causes  difficulty  as  technical  knowledge,  as  well  as  linguistic  skill,  is  required. 

3. 4  Announcenent 

All  Centres  prepare  and  circulate  lists  of  books,  journals  and  reports  received.  In  the 
case  of  the  larger  ones  these  usually  take  the  form  of  Abstracts  Journals  where  the  new 
material  is  listed  under  subject  headings  and  an  abstract  is  given. 

3. 5  Circulation 

Circulation  procedure  varies  but  typically  a  Centre  distributes  both  at  home  and  abroad 
according  to  agreed  lists  which  may  be  standardised  but  which  are  generally  unique  to  each 
report.  Hie  reports  it  received  from  outside  will  be  circulated  to  a  smaller  circle  of  its 
own  scientists  either  by  lists  or  by  a  knowledge  of  particular  interests. 

3.6  Requests 

Requests  may  be  for  a  report  given  as  a  reference  or  listed  in  an  Abstract  Journal  in 
which  case  a  quick  clerical  search  to  guard  against  transposition  of  numbers,  etc.  is 
adequate  More  difficult  are  requests  for  reports  on  a  specified  subject.  Here  a  subject 
search  is  required  and  the  success  depends  on  several  factors  of  which  the  more  important 
are:  - 


1.  The  detail  in  which  the  enquirer  can  state  what  he  wants. 

2.  The  skill  of  the  Information  Officer  in  converting  the  request  into  the  index  terms. 

3.  The  thoroughness  of  the  original  Indexing. 

4.  Finally  the  mechanics  of  the  retrieval  process. 

Finally  there  is  the  question  of  search  for  data.  The  documentalist  will  make  a  subject 
search  to  turn  out  literature  in  which  the  data  may  be  expected  to  be  recorded.  If  this  is 
not  successful  he  will  probably  attempt  to  put  the  enquirer  in  touch  with  a  specialist  in 
the  field.  In  most  organisations  data  searches  are  difficult,  costly  and  not  altogether 
satisfactory  and  there  is  now  a  tendency  to  concentrate  this  type  of  work  in  specialist 
centres  (see  paragraph  5). 
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3.7  BibliOKraiphies 

Bibliographies  may  be  prepared  as  a  result  of  a  subject  search  or  from  known  interests 
In  a  specific  field.  They  may  or  may  not  Include  abstracts  but  should  preferably  be 
arranged  in  an  ordered  manner  by  subsidiary  subject. 

3.8  “Security”  Control 

Circulation,  announcement  and  loan  of  reports  must  be  controlled  so  that  copies  are 
only  sent  to  those  entitled  to  receive  them.  Restrictions  include  Security  Classification 
(Confidential,  Secret,  etc.)  but  it  should  be  noted  that  many  reports  marked  "Unclassified” 
may  not  be  widely  distributed  because  of  proprietary  rights.  Patent  questions  or  policy. 
Such  Unclassified  reports  which  must  have  limited  distribution  should  obviously  be  marked 
appropriately  and  there  is  now  a  growing  tendency  to  mark  the  remainder  “Unlimited”  to 
indicate  that  there  are  no  restrictions  on  circulation. 

3. 9  Exchange 

An  Important  activity  is  the  exchange  of  reports  with  other  similar  Centres.  Pormal  and 
Informal  exchange  agreements  are  made,  particularly  with  Documentation  Centres  in  other 
countries.  It  is  here  that  A. G.A.R.D.  has  made  a  major  ccMitributlon,  both  in  encouraging 
exchange  and  in  bringing  together  the  Heads  of  the  National  Documentation  Centres  in  the 
Technical  Information  Panel  where  common  difficulties  may  be  frankly  discussed. 
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4..  OTHER  CENTRES 


In  N.A.T.O.  countries  there  are  a  number  of  documentation  units  or  Agencies  apart  from 
the  National  Defence  Documentation  Centres.  Each  large  research  establishment  will  have 
its  own  library  or  Information  Agency  and  there  will  be  the  civil  technical  libraries 
attached  to  Universities,  Research  Associations,  etc.  Methods  of  operation  will  vary  but 
the  basic  requirements  as  previously  described  apply  but  with  a  modified  emphasis.  It  is 
highly  desirable  that  there  shall  be  collaboration  between  all  these  areas  of  documenta¬ 
tion. 


5.  TYPE  OF  AGENCY 


The  types  of  Agency  may  be  described  under  four  main  headings  although  there  is  usually 
considerable  overlap  of  activities  particularly  in  the  first  two. 


5. 1  Libraries 


Libraries  deal  with  published  Information  in  the  form  of  books,  periodicals  etc.  In 
most  cases  the  records  used  for  retrieval  consist  of  the  bibliographic  information  (Title, 
Author  and  Publisher)  and  a  broad  statement  of  the  main  subject. 


5.2  Documentation  Centres 


The  main  task  is  the  collection,  announcement  and  dissemination  of  unpublished  report 
literature  although  published  literature  may  also  be  processed.  The  records  are  usually 
kept  in  a  more  detailed  form  and  most  Centres  will  make  subject  searches,  prepare  biblio¬ 
graphies  etc. 


5.3  Information  Analysis  Centres* 


These  work  in  a  specialised  field,  have  direct  access  to  working  scientists  in  that 
field  and  will  provide  data,  advice  and  evaluation  in  addition  to  answering  detail  subject 
enquiries. 


*SonetiaeB  called  Specialised  Infomatioo  Centres 
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S. 4  Referral  Centres 

Referral  centres  accept  enquiries  and  refer  them  to  the  organisation  best  fitted  to 
reply. 

National  Defence  Documentation  Centres  fall  within  the  second  category  but  may  extend 
into  the  othei  fields. 


6.  MECHANISATION 

Mechanisation  is  increasing  rapidly  and  many  Agencies  are  now  using  computers  in  their 
operation.  We  shall  hear  of  some  of  these  in  the  later  papers  but  a  few  words  on  the  need 
to  mechanise  are  appropriate  here. 

Many  documentary  units  are  running  very  efficiently  on  manual  systems  and  to  maintain 
balance  we  shall  hear  of  the  operations  of  one  or  two.  There  is  a  very  real  danger  that 
a  manual  system  that  is  not  operating  satisfactorily  will  be  automated  to  improve  its 
efficiency.  The  result  is  unlikely  to  be  satisfactory  for  a  poor  manual  system  will,  when 
put  on  a  computer  without  reorganisation,  be  even  less  efficient  and  will  cost  some  ten 
times  as  much. 

In  an  organisation  a  case  for  a  computer  is  frequently  made  not  on  the  basis  of 
retrieval  but  on  the  other  operations  such  as  the  preparation  of  Abstract  Journals,  Indexes, 
a  considerable  number  of  Bibliographies  or  other  such  lists  and  finally  on  “House-Keeping". 
House-Keeping  will  include  circulation  control,  “security"  check,  stock  control  and  other 
similar  functions.  Retrieval  will  certainly  be  done,  using  the  information  fed  in  for  the 
other  operations  but  this  alone  is  unlikely  to  be  financially  attractive  unless  the  operation 
is  very  large  indeed  or  is  a  “Selective  Dissemination  of  Information"  (S.D.I.)  operation 
with  a  considerable  number  of  customers  (S.D.I.  of  course  is  a  specialised  retrieval  search). 


7.  INDEXING 

Indexing  is  one  of  the  most  difficult  problems  in  documentation  for  unless  a  paper  is 
competently  indexed  it  is  lost  in  the  system.  For  some  years  subject  specialists  will  be 
required  in  documentation  centres  for  this  work.  Good  men  are  scarce  and  expensive  but  as 
they  are  also  required  for  abstracting  and  circulation  recommendation  the  indexing  is  but 
a  small  addition  to  the  load.  As  abstracting  becomes  less  necessary  and  circulation  is 
taken  over  by  S.D.I.  the  pressure  to  develop  mechanised  indexing  will  increase. 

7. 1  Mechanical  Indexing 

Much  work  is  being  done  on  this  subject  but  nothing  has  yet  been  demonstrated  which  is 
suitable  f  '  i«irge  collections.  KWIC  (Key  word  in  context)  Indexes  where  the  significant 
words  of  a  .  a  are  taken  in  turn,  sorted  into  alphabetical  order  and  printed  out  with  the 
remainder  of  the  title  are  very  good  for  current  awareness  but  fail  for  large  collections. 
Other  programmes  are  available  which  give  a  more  attractive  layout  but  suffer  from  the  same 
defect  that  the  resultant  indexes  are  very  much  diluted  by  non-significant  words  and  that 
the  words  of  the  title  may  be  a  poor  description  of  the  subject.  The  programmes  certainly 
eliminate  a  number  of  non-significant  words  but  the  problem  remains  that  a  word  may  be 
significant  in  one  title  but  useless  in  another.  Titles  now  have  more  meaning  but  the 
author  recently  had  a  microfiche  where  the  title  of  the  report  was  "Fuzzy  Sets".  N.A.S.A. 
in  STAR  does  rewrite  titles  as  "Informative”  titles  and  this  is  a  considerable  improvement 
and  it  would  be  interesting  to  hear  whether  any  experimental  work  is  being  done  in  using 
these  titles  for  retrieval. 

A  word  count  has  been  proposed.  The  words  of  the  abstract  or  of  the  reports  are  surveyed 
and  those  occurring  most  frequently  are  sorted  and  printed  out  as  indexing  terms.  Like  KWIC 
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Indexing  this  produces  a  much  diluted  index  and  on  occasions  fails  because  the  significant 
words  occur  infrequently  or  are  replaced  by  synonyms. 

It  is  the  author’ s  opinion  that  the  solution  to  this  problem  will  come  from  the  work  of 
the  machine  translators,  perhaps  by  a  modification  of  the  highly  developed  "dictionary 
look-up”  procedures  and  word  association.  The  solution  may  result  in  the  allocation  by 
c  iputer,  of  conventional  terms  but  the  final  aim  should  be  to  develop  linguistic  programmes 
s  tnat  the  input  may  be  in  plain  language  and  that  the  "dictionary”  is  used  to  scan,  say 
tl; '  ..  bstract  and  to  obtain  a  match  despite  synonyms. 


8.  ^:WER  DEVELOPMENTS 

A  number  of  the  newer  developments  deserve  mention  for  not  all  will  be  known  by  the  large 
range  of  customers  of  documentallsts. 

8. 1  Information  Analysis  Centres 

Information  Analysis  Centres  have  grown  up  to  meet  the  needs  for  detail  advice  in 
specialist  fields.  We  shall  hear  of  the  working  of  several  of  these  and  it  is  sufficient 
here  to  say  that  they  are  usually  based  on  a  University  or  Establishment  working  in  a 
specialised  field. 

8.2  Referral  Centres 

Referral  Centres  have  raticmallsed  the  practice  of  most  documentary  units  of  referring 
some  enquiries  to  other  agencies  better  able  to  assist,  they  accept  enquiries,  transfer 
them  to  the  Unit  working  in  the  appropriate  field  and  monitor  the  results. 

8.3  SeJ<^;tive  Dissemination  of  Information 

Selective  Dissemination  of  Information  is  not  new  and  a  large  number  of  Libraries  and 
Information  Units  keep  a  "field-of- interest”  register  which  is  used  to  route  the  new 
material.  However,  S.D. I.  is  the  term  now  applied  to  the  large  operations  which  are  now 
possible  by  computer  of  which  we  shall  hear.  Basically  S.D. I.  is  a  retrieval  operation  but 
the  questions  asked  of  the  computer  are  "profiles”  of  each  customer  defining  his  Interest. 

The  "profiles”  have  to  be  built  up  by  allocating  index  terms  so  that  the  search  picks  out 
the  reports  likely  to  be  of  Interest  to  the  particular  scientist.  Difficulties  arise  in 
establishing  and,  in  particular,  keeping  profiles  up  to  date  and  experience  seems  to  indicate 
that  the  profile  of  a  g.'oup  of  scientists  produces  more  satisfactory  results  than  individual 
profiles. 

8.4  Citation  Indexing 

Citation  Indexing  is  comparatively  new  and  is  a  computer  operation  producing  au  index  of 
all  papers  in  which  reference  is  made  to  one  older  publication.  Thus  a  scientist  knowing 
one  authority  in  the  field  in  which  he  is  interested  can  turn  up  the  author  and  title  of  all 
later  papers  which  give  the  original  report  as  references.  The  process  is  expensive  in 
preparation  and  time  consuming  in  use  for  not  all  references  will  be  on  the  main  subject  but 
in  some  cases  it  will  produce  papers  missed  by  other  searches.  It  is  certainly  an  additional 
bibliographic  tool  which  in  some  cases  will  be  very  valuable  Indeed.  However,  despite  some 
claims,  it  cannot  replace  other  methods  of  search. 

8. 5  Microform 

Photography  has  been  used  for  many  years  in  relation  to  storage  of  documents.  Records  on 
film,  35  mm,  16  mm  or  cut  film  have  'b^jen  made  to  duplicate  records  for  preservation  or  for 
easy  transmission.  Some  systems  too  have  been  developed  for  storage  and  retrieval.  The 
Important  recent  development  has  been  the  widespread  use  of  microfiche  particularly  in  U.S.A. 
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The  standard  American  fiche  is  a  sheet  of  cut  film  6"  x  4”  containing  60  images  of 
approximately  A4  size  documents  but  the  older  European  systems  generally  used  smaller  fiche. 
Microfiche  have  the  advantages  oi  ease  of  storage,  postage  and  reading  with  relatively  cheap 
equipment.  At  present  however,  particularly  in  Europe,  printing  hard  copy  from  them  is 
difficult  and  expensive  and  there  appears  to  be  more  resistance  from  scientists  to  using 
them  in  a  reader  rather  than  asking  for  hard  copy.  The  microfiche  Images  are  at  a  scale  of 
1-18  or  21  but  much  greater  reductions  are  now  being  demonstrated.  Any  documentary  centre 
of  reasonable  size  should  have  facilities  for  reading  film  and  microfiche  and  for  taking 
paper  copies  of  selected  pages,  that  is  they  should  have  reader  printers.  The  cost  of  the 
printing  of  hard  copy  and  the  preparation  of  microfiche  is  still  high  and  these  tasks,  for 
some  time,  will  probably  be  confined  to  a  few  centres  in  each  country. 


9.  THE  FUTURE 

We  shall  hear  of  several  new  proposals  which  will  influence  the  future  and  the  author  will, 
therefore,  merely  set  down  those  developments  which  he  considers  of  major  importance. 

9. 1  Preparation  of  Reports  on  Tape 

Apart  from  speeding  up  the  issue  of  reports  which  require  re-drafting  the  final  tape  can 
be  fed  directly  to  a  computer  for  documentary  processing.  At  the  present  time  an  expensive 
process  of  editing  and  key  punching  is  required  to  extract  the  bibliographic  data  and  present 
it  to  the  computer. 

9.2  Text  Pleading 

Although  more  expensive  than  the  above,  machine  reading  of  text  typed  on  a  standard  form 
in  special  type  face  will  show  advantages  over  key  punching  for  those  reports  where  tape  is 
not  available. 

9.3  Time  Sharing 

Time  sharing  on  computers  by  making  use  of  "dead”  computing  time  occupied  by,  say  print 
out,  allows  a  number  of  unconnected  tasks  to  be  done  almost  simultaneously.  As  a  further 
extension  tasks  may  be  fed  to  the  computer  by  telephone  from  a  number  of  remote  terminals 
and  the  coded  casks  stored  until  computation  time  is  available.  We  shall  hear  of  one  or  two 
of  the  documentary  applications. 


9.4  Display  Techniques 

Display  techniques  are  advancing  rapidly  and  it  is  already  possible  to  envisage  the  remote 
display  of,  say,  abstracts  in  response  to  a  subject  enquiry. 

9.5  Rapid  Printing 

On  a  more  mundane  level  there  is  a  very  real  need  for  economic  printing  at  a  speed 
compatible  with  computer  speeds.  This  and  display  techniques  will  no  doubt  progress  together 
for  remote  printing  while  computer  typesetting  is  already  in  limited  use. 

9.6  Tape  Exchange 

It  is  highly  desirable  that  documentary  centres  shall  be  able  to  exchange  tapes  and  so 
avoid  the  duplication  of  processing  that  is  now  required  Unfortunately  this  is  seldom 
possible  due  to  incompatibility  of  documentary  format,  indexing  or  machine  language  (most 
probably  all  three).  The  problems  of  obtaining  agreement  are  considerable  but  as  they  are 
not  of  interest  to  the  customer  they  will  not  be  enlarged  on  here.  It  should  be  sufficient 
to  say  that  several  international  bodies  are  taking  this  subject  very  seriously  Indeed  and 
much  effort  is  being  devoted  to  it. 
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Finally,  it  should  be  noted  that  for  many  years  there  will  still  be  the  need  for  local 
information  centres  without  much  mechanisation,  although  there  will  probably  be  a  tendency 
for  them  to  rely  more  and  more  on  the  large  fully  mechanised  central  stores. 


10.  EDUCATION 

The  need  for  education  of  Information  Officers  or  Librarians  in  the  operations  and  in 
the  customers’  needs  is  now  fully  recognised  and  the  reconunendations  made  by  the  several 
bodies  considering  their  problems  are  now  being  implemented  in  most  countries.  Less  well 
recognised  is  the  need  to  educate  the  customers  in  the  many  sources  of  information  that 
are  now  available  and  the  best  ways  of  using  them.  The  author  being  a  scientist  (and 
engineer)  converted  to  documentallst  is  acutely  aware  of  the  deficiencies  on  both  sides 
and  hopes  that  the  talks  on  this  subject  will  do  something  to  remove  the  misunderstandings 
which  still  exist. 

Management  also  requires  education  for  although  a  few  organisations  fully  recognise  the 
Importance  of  documentation  they  are  in  the  minority.  As  an  example  of  good  practice  one 
firm  appoints  an  Information  Officer  to  the  team  working  on  a  new  project.  He  attends  the 
meetings  and  is  responsible  for  searching  the  literature  and  feeding  the  team  with  all  the 
appropriate  information  that  is  available. 


The  Information  Officer  is  anxious  to  assist  the  scientist  but  his  task  is  made  easier 
and  the  result  will  be  better  if  the  scientist  is  aware  of  the  documentary  problems.  Some 
specific  recommendations  are:- 

(a)  In  report  writing  use  en  informative  title  and  include  an  abstract  of  some  100-200 
words  which  fully  describes  the  report. 

(b)  In  making  requests  for  documents  be  as  specific  as  possible.  Thus  where  the 
request  arises  from  a  reference,  quote  originator.  Report  number  and  say  author. 
This  gives  sufficient  redundancy  to  allow  for  a  check  against,  say,  transposition 
of  numbers.  If  the  Information  available  is  Incomplete,  give  everything  there  is 
and  say  "this  is  all  available”. 

(c)  In  asking  for  a  subject  search,  be  as  specific  as  possible  and  give  an  indica^xon 
of  whether  the  search  can  be  limited  by  date.  If  It  is  to  be  a  detailed  search  for 
obscure  information  give  some  advice  cn  the  type  of  report  which  may  contain  it. 

In  case  it  should  be  felt  that  these  recommendations  are  too  elementary  it  should  be 
said  that  in  the  author’ s  organisation  some  15%  of  requests  are  incomplete  and  either 
require  a  search  on  Inadequate  data  or  reference  back.  The  delay,  nuisance  value  and 
increase  of  costs  is  quite  considerable.  Again,  T. I.P.  was  asked  to  prepare  a  bibliography 
on  "Brittle  Materials”  and  it  was  not  until  the  work  had  started  that  the  author,  by 
approaching  the  British  Panel  Member,  ascertained  that  Interest  was  only  in  the  Ceramics. 

In  concluding  I  should  emphasise  that  this  is  an  introductory  paper  and  that  some  of  the 
subjects  I  have  touched  on  will  be  dealt  with  much  more  fully  by  later  speakers.  I  hope 
however  that  I  have  given  an  overall  picture  of  the  activities  of  the  several  types  of 
Information  Centres,  the  problems  which  face  the  supplier  of  information  and  some  of  the 
new  tools  which  are  becoming  available  to  him. 
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DISCUSSION 


R.  Bree:  Could  you  coDunent  on  the  possibility  of  nlcrostorage  of  full  texts  within  the 
computer  store? 

H.F. Vessey:  Methods  of  dense  storage  on  microform  have  been  demonstrated  with  reductions 
of  200 :  1  and  this  could  be  one  method.  Another  is  to  store  information  on  magnetic  tape 
and  use  this  to  generate  a  cathode  ray  display. 


C.A.Bell:  Are  any  details  of  the  McGraw  Hill  Memory  Bank  on  Defence  Projects  available? 

W. C.Chrlscensea:  The  McGraw  Hill  system  is  probably  the  SHE£TS’  System.  A  much  better 
source  of  information  on  continuing  U.S.  research  and  technology  is  the  Smithsonian 
Institute’s  data  bank  on  research  and  technology. 
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The  compilation  of  documentary  card  indexes  implies  that  the  information 
contained  in  a  text  can  be  characterized  by  means  of  a  certain  number  of 
signs.  The  overall  signs  used  and  available  make  up  a  documentary  vocabulary 
serving  two  purposes:  on  the  one  hand,  characterizing  texts  (analysis  of 
subject  matter);  on  the  other  hand,  expressing  documentary  research.  To 
fulfil  both  tasks,  documentary  vocabulary  must  be  extensive  (accuracy  of 
representation)  and  structured  (research  strategy).  Structured  documentary 
dictionaries,  or  thesauri,  appear  to  be  the  indispensable  link  between 
authors  and  people  requesting  information.  A  few  practical  methods  for 
achieving  such  thesauri  are  presented. 


RESUME 


La  creation  des  flchlers  documentaires  suppose  que  1'  information  contenue 
dans  un  texte  pulsse  6tre  caracterisde  A  1’  aide  d’  un  certain  nombre  de 
signes.  L'  ensemble  des  signes  utilises  ou  dlspc'ibles  constitute  un 
vocabulaire  documentaire  utllisd  L  deux  fins:  d' une  part,  la  caracterlsation 
des  textes  (analyse  du  contenu),  d’ autre  part,  1’ expression  des  recherches 
documentaires.  Ces  deux  fonctions  exigent  que  le  vocabulaire  documentaire 
soit  dtendu  (precision  de  la  representation)  et  structure  (strategic  de 
recherche).  Les  dictlonnalres  documentaires  structure's  ou  thesaurus 
apparaissent  comme  la  liaison  indispensable  entre  les  auteurs  et  les 
demandeurs.  On  decrlt  quelques  methodes  pratiques  permettant  d*  obtenir 
de  tels  thesaurus. 
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LES  PROBLEMS  POSES  PAR  LE 
VCCABULA13E  DOCUMENTAIRE  ET  L' ORGANISATION 
DES  DICTIONNAIRES  ET  THESAURUS 

P.  Levrfry 


1.  LES  BUTS  C  UN  VOCABULAIRE  DOCUMENTAIRE 

Le  prlncipe  de  tout  flchler  documentalre  consiste  &  representor  le  contenu  des  documents, 
c’  est  i  dire  1’  Information  presentde  par  1'  auteur,  A  1*  aide  d'  un  certain  nombre  de  signes 
ou  codes.  L‘  ensemble  des  signes  susceptibles  d*  etre  utilisds,  constituent  un  vocabulaire 
documentaire. 

Dans  le  flchler,  la  totalite  de  1*  Information  d'  un  texte  se  trouve  reduite  A  la  seule 
Ipformatlon  contenue  dans  les  signes  or  codes  (jul  ont  AtA  retenus  au  .aoment  de  1’  indezage 
du  texte.  II  s’  ensuit  evldemment  une  certalne  perte  de  1’  information  puia>iu’  un  certain 
nombre  de  notions  pre'sentees  dans  le  texte,  sont  exclues  de  la  representation  documentaire. 

Si  I’on  analyse  ce  phenomAne  d’une  maniAre  un  peu  plus  prAcise,  on  s’aperqoit  que  la 
perte  d’  information  au  moment  de  1’  Indexage  peut  Atre  due  A  deux  causes  dlstinctes: 

-  ou  bien  cette  perte  est  vonlae,  c’  est  A  dire  que  1’  analyste  a  considAre  que  certaines 
informations  contenues  dans  le  texte  n’Ataient  pas  suffisamment  importantes  pour  les 
faire  apparaftre  dans  le  fichier.  Le  documentaliste  limite  volontairement  la 
profondeur  d’  indexage  et  ne  conserve  que  les  notions  qu’  il  Juge  nAcessaires.  Cette 
limitation  peut  Atre  due  solt  A  une  specialisation  poussAe  du  fichier  documentaire, 

(ce  qul  amAne  A  exclure  des  notions  sans  rapport  avec  la  specialisation)  soit  A  des 
contralntes  exterieures  (temps,  volume  A  m^orlser,  etc...). 

-  ou  bien  la  perte  d'  information  est  svhie.  Ceci  se  produit  lorsque  le  vocabulaire 
documentaire  mis  A  la  disposition  de  1’  anaxyste  ne  permet  pas  de  prendre  en  compte 
1’ information  prAsentee  par  1’ auteur.  Deux  raisons  essentielles  peuvent  Atre  A 

r origins  de  cette  Imporsibilite: 

-  la  notion  exprlmAe  par  1’  autuer  n’  a  pas  d'  equivalence  dans  le  vocabulaire  documentaire. 
II  s’  agit  alors  en  gdneral  d’  une  notion  nouvelle  (done  malheureusement  Importante  du 
point  de  vue  documentaire): 

-  le  vocabulaire  n’  est  pas  suffisamment  precis  ou  etendu.  La  representation  de  certaines 
notions  n’  est  possible  qu’  en  faisant  des  approximations:  on  utilise  souvent  des  cooes 
du  langage  documentaire  correspondent  A  des  notions  plus  gendrales. 

Nous  voyons  done  que  la  perte  d’  information  d&ns  un  systAme  d  cumentalre  est  un  phAnomAne 
inevitable,  inhArent  A  la  function  mdne  de  1’  indexage.  II  convient  cependant  de  mettre  au 
point  des  langages  '  ;cumentaires  tels  que  cette  perte  d’  information  pu^  sse  Atre  contrdlAe. 

II  faut  que  1’  anal  .te  puisse  fixer  lui-rnAme  la  pi.*te  tolArable  en  prAcisant  la  profondeur 
d’  Indexage  nAcessaire.  II  faut  Aviter  que  cette  perte  solt  la  consAquence  d’  un  langage 
documentaire  Insuffisant  ou  mal  adaptA. 

Les  rAflexions  qui  prAcAdent  ont  amenA  les  documentalistes  A  concidArer  qu’un  bou 
vocabulaire  documentaire  devait  avoir  deux  qualitAs  essentielles: 

-  le  vocabulaire  dolt  Atre  precis,  de  telle  manlAre  que  toute  notion,  aussi  spAclflque 
et  dAtaillAe  soit-elle,  puisse  trouver  la  reprAsentatlon  exacte. 


24 


-  le  vocabulalre  dolt  ^re  extensible  de  faqon  4  prendre  en  compte  des  notions  nouvelles 
qul  vparalsscnt  dans  les  disciplines  tschnioues  et  scientlfiques  en  Evolution. 

La  n^cesslt^  di  r^unlr  ces  deux  caract^rlstlques  a  souvent  entrafne  T  abandon  des  syst4mes 
traditlonnels  de  classification  hl^rarchls^e.  Ces  syst^mes,  en  effet,  ne  permettent  souvent 
Qi’/un  indexage  4  1*  aide  de  notions  g4n^rales  et  sont  trop  rigides  pour  s’ adapter  4  revolution 
des  sciences  et  des  techniques. 

Les  deux  qualit4s  de  precision  et  d’ extensibility  se  trouvent  en  fait  ryunles  dans  le 
langage  neturel  utllisy  par  les  auteurs.  En  effet,  le  fait  qu'  une  notion,  aussi  prdcise 
ou  nouvelle  solt-elle,  alt  pu  s’  exprlmer  dans  un  document,  montre  que  le  vocabulalre 
naturel  contient  les  termes  susceptlbles  de  1’  exprimer.  L’ ensemble  des  mots  utilisys 
par  les  auteurs  semble  done  const! tuer  un  vocabulalre  documentalre  ayant  les  qualitys 
requises. 

II  seralt  certalnement  possible  d’objecter  que  1’ information  contenue  dans  un  texte 
n’  est  pas  reprysentye  4  1’  aide  du  seul  vocabulalre,  mals  ygalement  4  1’  aide  de  relations 
logiques  et  syntaxiques.  La  reprysentat i un  documentalre  d’un  texte  4  I’aide  d’une  suite 
de  termes  du  vocabulalre  naturel  (mots-ciys  ou  descripteurs)  non  syntaxlquement  reliis 
entre  eux  apparait  done  comme  Insuffisante.  Nous  ne  contesterons  pas  cette  objection.. 

Nous  y  rypondrons  seulement  en  dlsant  qu’  il  est  nycessaire  de  mettre  en  oeuvre  une  certaine 
syntaxe  lors  de  la  cryatlon  des  fichlers  documentalres  utllisant  le  langage  naturel.  Nous 
ne  dyvelopperons  pas  ce  point  plus  avant,  quel  qu’  en  solent  1’  intdryt  et  la  gravity,  nous 
ryservant  de  traiter  uniquement  des  probl4mes  posys  par  1’  ytabllssement  des  dictionnaires 
documentalres. 


2.  CONSTITUTION  D’UN  VOCABULAIRE  DE  MOTS-CLES 

ff  11  est  exact  qu’  un  analyste  trouve  parmi  les  termes  utillsys  par  1’  auteur  d’  un  texte, 
tous  les  termes  nycessaires  4  la  reprysentation  documentalre  de  ce  texte,  la  constitution 
d’un  dictionnaire  documentalre  apparait  slmplement:  ce  dictionnaire  sera  constituy  par 
1’  ensemble  des  termes  retenus  au  cours  de  1’  indexage  de  tous  les  textes  d’  une  collection. 

II  se  constitute  au  fur  et  4  mesure  que  les  textes  sont  analysys.  Le  dictionnaire  apparait 
alnsi  comme  une  consequence  et  non  un  pryalable  de  1’  ludexage.  C’  est  un  sous-produit  de 
r  Indexage. 

O’  un  point  de  vue  pratique,  cette  mani4re  de  constituer  le  vocabulalre  documentalre 
est  tr4s  intyressante.  On  esc  en  effet  s(!r  que  tous  les  termes  de  ce  vocabulalre  sont 
utiles.  O’ autre  part,  la  cryatlon  du  flchler  documentalre  peut  ttre  entreprise  sans 
attente,  ce  qui  ytalt  impossible  avec  les  syst4iDes  traditlonnels  pulsque  1’  Indexation  des 
documents  ne  pouvait  commencer  qu’apr4s  la  construction  d’une  structure  classlflcatoire. 

II  semble,  d’  {^r^s  ce  qui  pryc4de,  que  1’  importance  du  vocabulalre  documentaire,  mesurde 
en  nombre  de  termes,  dypendra  de  deux  facteurs:  le  nombre  de  documents  de  la  collection 
et  le  profondeur  d’  indexage,  c’  est  4  dire  le  nombre  moyen  de  termes  retenus  pour  ebaque 
document. 

L’ expyrlence  montre  que  cecl  n’est  pas  exact. 

La  probability  pour  qu’  un  terme  nouveau  apparaisse,  dimlnue  avec  le  nombre  de  documents 
prycydemment  analysys. 

Au  bout  d’un  certain  temps,  le  nombre  de  termes  nouveaux  devient  pratlquement  nygllgeable. 
Ces  termes,  en  gyndral  des  nyologismes  concernent  des  notions  nouvelles.  On  notera  au 
passage,  1’ intyressante  utilisation  de  cette  mythode  pour  dytecter  d’une  mani4re  automatique 
1’  apparition  d’  un  concept  nouveau  dans  une  discipline. 
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Lorsaue  le  dictionnaire  ne  s’  accrolt  pratiqueiEent  plus,  on  dit  que  1’  on  a  atteint  un 
'vocabulaire  terminal'. 

Reste  a  savoir  conuoent  ce  vocabulaire  terminal  varie  en  function  de  la  profondeur 
d’  indexagc. 

Ici  encore,  1’  experience  montre  que  la  profondeur  d’  indexage  n’  influe  pas  sensiblement 
sur  1’  e'tendue  du  vocabulaire  documentaire  et  ceci  s’  explique  alsement: 

Pour  une  profondeur  d’  Indexage  donn^e,  le  fait,  pour  un  teme,  d’  avoir  it4  exclu  de  la 
liste  des  mots-cies,  ne  depend  pas  de  ce  terme  en  lui-mdne,  mais  seulement  de  son  importance 
plus  ou  moins  grande  dans  le  texte.  Ce  mSoe  mot  pourra  etre  considerd  comme  mot-cie  pour 
un  autre  texte  oil  il  Jouera  un  rdle  plus  important.  Dans  ces  conditions,  le  vocabulaire 
terminal  sera  le  m€me,  quelle  que  soit  la  profondeur  d’  indexage.  On  1’  atteindra  seulement 
plus  ou  moins  rapldement  selon  que  le  nombre  moyen  de  mots  cl^s  utilises  par  document  sera 
plus  ou  moins  ^levd. 

En  fait,  le  vocabulaire  terminal  apparait  comme  caract^.istique  d’une  discipline.  Cette 
remarque  est  partlculi^rement  importante  puisqu’  elle  assure  la  compatibility  et  mdne 
1’ identity  des  diffyrents  vocabulalres  utillsys  par  plusieurs  centres  documentaires  traitant 
de  la  m^me  discipline  mais  avec  des  profondeurs  d’ indexaee  diffyrentes  (Pig.  1). 

3.  VOCABULAIRE  DE  L’ INFORME  ET  DU  NON- INFORME 

Toutes  les  mythodes  de  recherche  documentaire  dans  des  flchiers  reposent  sur  le  myiie 
principe.  il  s’ agit  de  retrouver  les  documents  qul  ont  4t4  indexys  i  1*  aide  des  tenses  du 
vocabulaire  documentaire  qui  caractyrisent  la  demande. 

Lorsque  ce  vocabulaire  est  constituy  de  tenses  du  langage  naturel,  la  demande  se 
prysentera  sous  forme  d’  une  certaine  combinaison  logique  de  mots  ciys. 

Il  est  bon  de  noter  que  cette  mani^re  de  faire  repose  en  fait  sur  une  hypothdse  impllcite: 
on  suppose  que  le  demandeur  a  une  connaissance  suffisante  du  sujet  sur  lequel  porte  sa 
question  pour  pouvoir  ytablir  la  liste  des  mots-ciys  qui  vont  guider  la  recherche. 

Or,  cette  hypoth^se  n’ est  pas  toujours  vyrifiye.  Bien  souvent  le  demandeur  n’a  que  des 
iddes  imprycises  (et  c’est  d’ ailleurs  la  raison  pour  laquelle  il  cherche  &  se  documenter). 

La  terminologie  correspondant  i.  la  recherche  lui  est  souvent  inconnue  et  sa  demande  est 
libeliye  i  I’aide  de  tenses  totalement  diffyrents  de  ceux  que  les  auteurs  utilisent  pour 
rypondre  il  cette  demande. 

Une  analyse  systymatique  du  vocabulaire  employy  par  les  auueurs  et  de  celui  rencontry 
dans  les  demandes  de  documentation  montre  qu'  il  existe  en  fait  deux  vocabulalres  distincts: 

-  le  vocabulaire  des  autuers  ou  'vocabulaire  de  I ' informe’ ,  en  gynyral  prycls  et  ytendu 

-  le  vocabulaire  des  demandeurs  ou  'vocabulaire  du  non- informe',  beaucoup  plus  limity 
et  constituy  de  tenses  plus  gynyraux  . 

Une  vyrifi cation  faite  dans  un  centre  de  documentation  sur  1’ yiectronique  a  donny  les 
rysultats  suivants: 

L’ analyse  des  documents  avalt  fournl  un  vocabulaire  de  4683  mots.  Les  questions  Indexyes 
par  les  demandeurs  n'avaient  utilisy  que  1260  mots,  solt  environ  26%  du  vocabulaire 
disponible.  Parmi  ces  mots,  11  y  en  avalt  73  (de  nature  assez  gynyrale)  qui  reprysentaient 
i  eux  seuls  31%  des  mots  utilisys  pour  1'  ensemble  des  questions. 
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Cette  difference  constatee  entre  les  vocabulaires  utilises  par  les  auteurs  et  les 
demandeurs,  rend  probiematique  la  quallte  d* une  selection  documentaire  fondde  uniquement 
sur  la  conparalson  des  listes  de  mots  elds  representatlfs  des  demandes  et  des  documents. 

On  notera  au  passage  que  cette  difflculte  particuliere  n’existait  pas  lorsque  Ton 
utllisait  des  systemesde  classification  hierarchises.  En  effet,  ces  syst^mes  foumissaient 
facilement  pour  chaque  notion,  la  liste  des  notions  plus  specifiques,  plus  generales  ou 
siopleoent  voisines.  La  structure  meine  du  syst^me  de  classification  servait  de  guide  au 
demandeur  et  facilltait  le  travail  d'  Indexage  des  demandes.  Elle  permettalt  de  trouver 
un  degre  de  precision  commun  pour  la  representation  des  documents  et  des  interrogations, 
ce  qui  assurait  la  liaison  entre  le  vocabulaire  de  1' Informe  et  celul  du  non- informe. 

Lorsque  1’  on  utilise  pour  la  creation  et  1’  interrogation  des  flchiers  les  termes  du 
langage  naturel  la  liaison  entre  les  deux  vocabulaires  doit  etre  explicitee.  II  faut 
done  construlre  des  dictionnalres  qui  permettent  de  trouver  les  termes  du  vocabulaire  des 
auteurs  qui  se  trouvent  implicltement  concemes  par  tout  terme  apparaissant  dans  une  demande. 

Cecl  revlent  k  dire  que  pour  tout  terme  du  vocabulaire  documentaire,  il  faut  dtablir  la 
liste  des  termes  qui  ont  avec  lul  une  certaine  relation  de  signification.  Les  dictionnalres 
structures  constltues  par  1’  ensemble  de  ces  listes  portent  le  nom  de  IHESAURUS. 

L*  utilisation  de  ces  thesaurus  peut  etre  envisages  de  deux  manieres  differentes: 

-  une  premiere  technique  consiste  i  se  servlr  du  thesaurus  au  moment  de  1’  indexage  des 
textes.  On  fera  flgurer  dans  la  liste  d* indexage  de  chaque  document,  certains  termes 
qui  ne  figurent  pas  dans  le  document  lul-meme  mais  qui  pourraient  exister  dans  une 
question  pour  laquelle  le  document  serait  pertinent.  En  g^n^ral  cette  utilisation  du 
thesaurus  i,  1'  entree  s’  accompagne  d’  une  certaine  normalisation  du  vocabulaire  d’  indexage, 
en  partlculler  de  la  reduction  des  synonymes.  Une  telle  organisation  suppose  bien 
entendu  que  le  thesaurus  preexists  et  a  it4  constitu^  avant  1’  indexage  des  documents. 

-  Une  deuxi^e  methods  consiste  a  consulter  le  thesaurus  au  moment  de  1' interrogation. 

On  recherchera  tous  les  termes  s^antiquement  relics  &  ceux  de  la  demande  et  qui  ont 
pu  ^re  utllis^p  nu  cours  d>%  1’  indexage  des  documents.  Cette  m^thode  pr^sente 

1’  avantage  de  ne  pas  subordonner  la  constitution  des  fiebiers  k  1’  ^tablissement 
pr^alabie  du  thesaurus. 

L’  utllitd  du  thesaurus  n’  evparait  qu’  au  moment  des  interrogations  une  fois  le  flchler 
const  it  u^. 

Le  choix  entre  les  deux  m^thodes  d  '^d  en  fait  de  facteurs  dconomlques; 

si  la  collection  documentaire  est  limlt^e  cC  le  nombre  de  demandes  important,  il  y  aura 
intdr^  k  utlllser  le  thesaurus  k  1’  entrde  une  fois  pour  toutes  au  cours  de  1'  indexage. 

Si,  au  contraire,  le  nombre  de  documents  est  trds  important  par  rapport  au  nombre  de 
demandes,  11  sera  prdfdrable  d'  utlllser  le  thesaurus  au  cours  de  1’  interrogation. 


4.  CONSTITUTION  DES  THESAURUS 

Il  est  possible  de  dlstinguer  trois  types  de  relations  s^mantiques  entre  les  termes  du 
vocabulaire  documentaire: 

-  les  relations  d‘ equivalence  qui  conduisent  &  la  creation  de  dictionnalres  de 
synonymes 

-  les  relations  d'inclusion  qui  traduisent  les  relations  hl^rarchiques  entre  termes  et 
qui  correspondent  aux  differentes  degr^s  de  generalite  des  notions. 

-  les  relations  de  voisinage  qui  permettent  d’afflrmer  que  plusleurs  termes,  sans  6tre 
synonymes  nl  dependant  hierarchlquement  les  uns  des  autres,  recouvrent  un  certain 
nombre  de  concepts  comouns. 
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Ces  diff^rents  types  de  relation  dolvent  bien  entendu  spparattre  dans  les  Thesaurus, 
nals  il  existe  des  m^thodes  dlff^rentes  pour  les  mettre  en  Evidence. 

(a)  Dictionnaires  de  synonymes 

L’  experience  montre  que  les  dictionnaires  de  synonymes  llnguistiques  sont  dlfficllement 
utillsables  4  des  fins  documentaires.  Ils  concernent  en  effet,  en  general,  des  domalnes 
tr^s  vastes  dans  lesquels  la  synunymie  est  prise  sous  un  aspect  rigoureux.  En  documentation, 
au  contralre,  le  domalne  semantlque  est  plus  specialise  et  la  synonymie  documentalre  est 
plus  large..  II  suffit  de  pouvoir  affinner  que  tout  document  indexe  4  I’alde  du  mot  A 
doit  etre  pris  en  consideration  pour  toute  demands  indexes  4  1’ aide  du  mot  B  (et 
reciproquement),  pour  considerer  A  et  B  comme  synonymes  documentaires. 

La  creation  du  dlctlonnalre  des  synonymes  pent  6tre  realisee  d'une  maniere  assez  simple 
au  cours  metne  de  la  creation  du  vocabulaire  documentalre: 

Pour  cbaque  mot  nouveau  consldere  conmie  mot-cie,  on  fait  une  recherche  rapide,  non- 
ext.austlve  de  ses  synonymes  documentaires.  On  obtient  ainsi  une  liste,  forcement  incomplete, 
des  termes  qul  experiment  la  meme  notion.  On  affecte  un  numero  de  notion  4  cette  liste. 

Si,  au  cours  de  1’ Indexage  des  documents  suivants  ou_du  traitement  d’une  demands,  il  apparaft 
un  terme  nouveau,  synonyme  oubli4  de  la  liste  pr4c4dente,  une  procedure  analogue  lul  sera 
forc4ment  appliqude.  Au  cours  de  cette  deuxl4me  recherche,  il  est  parfaitement  improbable 
qu’  aucun  des  mots  de  la  preffli4re  lists  n’  apparaisse  (on  peut  oublier  un  mot  dans  une  liste 
mals  pas  une  liste  enti4re).  Il  y  aura  done  au  moins  un  terme  du  vocabulaire  qui  sera 
consld4r4  comme  synonyme  de  deux  notions  distinctes  et  qui  sera  affect^  de  deux  num^ros 
de  notion  diff4rents.  Cette  anomalle  peut  4tre  tr4s  facileoent  d4tect4e  par  des  moyens 
automatiques,  ce  qui  conduit  4  une  correction  pratiquement  automatique  du  dictlonnaire  de 
synonymes. 

(b)  Relations  hierarchiques 

La  mise  en  Evidence  des  relations  d’  Inclusion  entre  les  termes  du  vocabulaire  documentalre 
peut  6tre  obtenue  d'  une  manl4re  relativement  simple  en  utilisant  les  proprl4t4s  des  syst^mes 
traditionncls  de  classification  bi4rarcbis4e. 

Dans  un  domaine  s4mantique  donn4,  chaque  terme  du  vocabulaire  poss4de  une  certaine 
signification  qui  peut  4tre  raccord4e  4  une  rubrique  d’  un  syst4me  de  classification.  Il 
suffit  pour  cela  de  considerer  que  la  signification  du  terme  est  de  nature  analogue  4  un 
document  que  1’  on  chercherait  a  classer.  Tous  les  termes  ayant  entre  eux  des  relations 
hierarchiques,  se  retrouveront  classes  sous  des  rubriques  hierarcbiquement  reliees,  ce 
qul  facillte  conslderablement  le  travail  de  redaction  du  thesaurus. 

Une  autre  methods  consists  4  constltuer  le  thesaurus  au  fur  et  4  mesure  des  interroga¬ 
tions. 

Les  documentalistes  qui  indexent  les  demaades  ont,  en  effet,  1’  habitude  de  fairs  edater 
les  mots-cies  generaux  et  de  lee  remplacer  par  une  liste  de  mots-cies  plus  specifiques 
relies  entre  eux  par  des  OU  loglques.  Ea  effectuant  ce  travail,  11s  constituent  en  fait 
des  micro-thesaurus  tnstantane's  qui  tradulsent  les  relations  hierarchiques  entre  les  mots- 
cies  generaux  de  la  question  et  les  mots  plus  specifiques  du  vocabulaire  documentalre. 

Si  1’  on  prend  la  decision  de  conserver  systematiquement  et  de  memoriser  ces  associations, 
on  obtient  un  thesaurus  de  relations  hierarchiques.  Ce  thesaurus  sera  d’  autant  plus  complet 
que  le  centre  documentalre  sera  .plus  Interroge.  Nous  voyons  done  que  le  thesaurus  peut  etre 
constitue  4  partir  de  1’  analyse  des  Jcniandes,  d’  une  mani4re  assez  parall4le  4  ce  que  nous 
avons  vu  pour  1’ etablissement  du  vocabula'.re  documentalre,  qui,  lui,  etait  obtenu  4  partir 
de  1’ analyse  des  documents. 

Cette  m4thode  a  iti  utills4e  d’  une  uanidre  automatique  dans  un  centre  documentalre  qul 
poss^dait  une  vaste  collection  de  ‘profils'  destines  4  la  diffusion  selective.  On  a 
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recherch^  pour  chaQue  terme  la  liste  de  ceux  qui  lul  ^talent  frdquemment  rattach^s  i 
1’  aide  d’  un  OU. 

Les  listes  obtenues  constltualeat  en  fait  un  thesaurus. 

(c)  Relations  de  voisinage 

Ce  type  de  relation  ^tait  explicit^  dans  les  structures  documentaires  traditionnelles 
par  des  notations  du  type  “Voir  aussi’ .  Elies  indiquent  qu’  il  exists  une  certaine  analogie 
de  signification  (et  nor  une  Equivalence)  entre  deux  termes.  Par  example  RESISTANCE 
MECANIQUE  et  USURE. 

Ces  relations  sont  en  general  nettement  mises  en  Evidence  lorsque  Ton  utilise  la 
technique  dEj4  mentlonnee  qui  consists  a  ramener  les  termes  du  vocabulaire  a  une  structure 
classlflcatolre:  deux  termes  volstns  sont  classEs  &  1'  aide  de  la  mEme  rubrique. 

Une  mEthode  plus  ElaborEe  consiste  a  comparer  systEmatiquement  les  dEfinitions  des 
termes.  Si  les  dEfinitions  de  deux  mots  possEdent  entre  elles  un  certain  nombre  de  points 
communs,  on  peut  en  dEduire  que  ces  deux  mots  sont  voisins  I'un  de  1' autre,  Cette  mEthode 
a  EtE  utilisEe  pour  dEterminer  des  relations  de  voisinage  dans  un  vocabulaire  me'dical., 

Chaque  mot  rencontrE  au  cours  de  1’  Indexage  des  textes  Etait  indexE  a  son  tour  en  utilisant 
un  vocabulaire  fondamental  de  dEfinisseurs  (d’ailleurs  assez  gEnEraux)  qui  prEcisait  la 
signification  de  ce  mot.  II  Etait  alors  possible  de  rechercher  automatiquement  pour  chaque 
terme,  tous  les  termes  qui  avaient  une  dEfinition  identique  (synonymes)  et  tous  ceux  qui 
possEdalent  des  dEfinisseurs  communs  (termes  voisins).  Cette  mEthode  prEsentait  en  outre 
1’  avantage  de  dEfinir  des  ‘degrEs  de  voisinage’ .  Le  voisinage  Etait  d’  autant  plus  Etroit 
que  le  nombre  de  dEfinisseurs  communs  Etait  plus  grand. 

Cette  notion  de  ‘degrE  de  voisinage’  s’est  d’ailleurs  rEvElEe  particuliErement  importante 
lorsque  I’on  a  utilisE  le  thesaurus  obtenu  pour  tralter  des  demandes  de  documentation.  II 
Etait  en  effet,  possible  de  proposer  au  demandeur  d’ Elargir  plus  ou  moins  la  question  qu* il 
posait.  Il  suffisalt  pour  cela  de  prendre  en  compte  les  termes  plus  ou  moins  voisins  de 
ceux  qu’  11  avait  employEs  pour  libeller  sa  question. 


5.  CONCLUSION 

Il  est  certain  que  1’  emplol  de  moyens  automatiques  et  1’  utilisation  du  vocabulaire 
naturel  ont  permis  de  rEsoudre  certains  problEmes  documentaires  et  ouvrent  des  perspectives 
prometteuses,  Il  n’  en  reste  pas  moins  vral  que  ces  techniques  nouvelles  posent  des 
problEmes  nouveaux  parmi  lesquels  la  constitution  des  dictionnaires  et  thesaurus  documentaires 
est  un  des  plus  difficiles  et  des  plus  urgents.  Il  ne  faut  pas  se  cacher  que  la  rEsolution 
de  ce  probleme  exlge  des  moyens  Etendus,  mais  nous  pensons  que  1’  importance  du  problEme 
documentalre  Justlfie  amplement  les  efforts  qui  restent  i  faire  dans  ce  sens. 
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APPENDIX 

Le  Centre  de  Docuaentation  de  la  Cie  IBM  Prance, 
Centre  D*  dtudes  et  Recherches,  la  Gaude 

R.  J.  Dubon 


1.  EVOLUTION  DES  TECHNIdUES 

La  mission  d*un  Centre  de  Documentation  traditionnel  pent  se  rc^sumer  k  quatre  tfiches 
principales: 

1.  Accueillir  les  documents  d  leur  arrlvde  au  Centre:  analyse,  classification,  emmagasinage. 

2.  Tenir  les  clients  du  Centre  constamment  informds  des  nouvelles  parutions  les  intdressant 
directement  (Diffusion  Selective  de  1’ Information). 

3.  Bffectuer,  d  la  demande  des  clients,  des  recherches  bibliographiques  sur  un  sujet 
d^termlnd  (Recherches  Retrospect Ives ^ 

4.  Etre  en  mesure  de  foumir  1’  ‘adresse'  d'un  document  de'termind  et  le  document  lui-mdme.. 

Dsns  une  PREMIETE  GENERATION,  utillsant  des  mdthodes  manuelles,  c  procdda  essentiellement 
d  la  OiASSIFICATION  HIERAROHIQUE:  ces  techniques  peuvent  etre  justifides  pour  des  banques 
d'  informations  llmitees  en  volume,  et  specialisdes  en  nature.  A  partir  d’  un  certain  volume 
d  I’entrde,  et  pour  des  informations  couvrant  un  large  dventail  d'activites,  les  systdmes 
manuels  et  leurs  divers  systdmes  de  classification  se  rdvdlent  inefficaces,  lents  et  coflteux. 

Un  premier  essal  d'  automatlsation  fut  tentd,  d  1*  occasion  de  la  SECONDE  GENERATKXV,  avec 
1'  Introduction  des  HOTS-CLES.  Dans  cette  dtape  vers  1'  automatlsation  compldte,  un  document 
technique  dtait  caractdrisd  par  une  sdrle  de  mots-clds  (10  en  moyenne)  et  des  rdfdrences 
telles  que:  tltre,  date,  noms  d'  auteurs.  Cet  ensemble  de  donnees  e'tait  mis  en  mdmoire  de 
r  ordinateur  puls  recherchd  par  programme.  II  n’y  a  pas  lieu  de  s’attarder  sur  cette  mdthode, 
car  les  rdsultats  furent  ddcevants:  on  a  constatd  que  seulement  10%  des  rdfdrences  trouvdes 
rdpondalent  aux  questions  posdes.  La  ‘sortie’  ordinateur,  tout  comme  1'  ‘entree’ ,  consistait 
en  une  llste  de  mots-clds  pas  assez  informative  pour  ddcrire  suffisamment  le  document,  et 
par  consequent,  source  de  confusion  avec  le  vdrltable  sujet  d’ intdrdt.  O’ autre  part,  tout 
systdme  utllisant  classification  hidrarcbique  ou  mots-clds  est  dangereux,  en  ce  sens  qu’  11 
fait  appel  d  une  analyse  humalne,  source  d’  erreurs;  enfin,  les  technologies  avance'es  font 
appel,  souvent,  d  des  notions  nouvelles  que  seul  le  recul  du  temps  permettra  d’  apprdcier. 

Ceci  est  grave,  car  un  document  mal  analysd  et  indexd  sera  perdu  a  Jamais  lors  d’une 
interrogation  future,  sauf  si  cette  dernidre  est  elle-mdme  mal  compose'e.  Enfin,  la 
ndcessltd  d’  une  intervention  manuelle  llmlte  le  volume  acceptable  d  1’  entrde  et  1’  on 
retrouve  les  inconvdnients  de  la  Premldre  Gdndration. 

Cette  mdthode  de  recherche  sur  mots-clds  est  aujourd’hui  totalement  abandonnde. 

La  TROISIEME  GENERATIcm  utilise  des  techniques  dites  ie  texte  normal’  ou  ‘langage 
Clair”,  objet  de  notre  discussion  sur  les  vocabulaires  de  documentation. 


2.  TRAITEMENT  DU  DOCUMENT  COMPLET  A 
L’  ENTREE  DU  SYSTEME 

Seu)  le  rdsumd  du  document  fait  1’ objet  d’un  traitement  ordinateur.  Le  document  complet 
est  ddtruit  aprds  avoir  dtd  rdduit  et  photographic  par  les  solns  d’ IBM  sur  jn  support  microfilm 
approprlC  (microfiche).  Cette  opCration  est  effectuCe  aux  USA  et  ne  concerns  pas  les 
articles  de  revues  et  pCriodiques,  pour  des  raisons  de  droits  de  reproduction. 
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Un  jeu  complet  de  ces  microfiches  est  ensuite  envoy^  au  Centre  europden,  ainsi  qu’ aux 
diff^rents  Services  de  Documentation  de  la  Compagnie  IBM  dans  le  monde,  clients  du  systeme. 


3.  TRAITEMENT  DES  RESUMES 

Dans  la  suite  des  operations,  c’ est  done  le  RESUME  du  document  qul  va  faire  I’objet  d'un 
traltement  automatise  et  etre  mis  en  rndmoire  de  1* ordinateur.  O' est  done  vdritablement  la 
“pensee  de  1'  auteur* ,  exprlmee  en  langage  clair,  qui  sera  "lue”  par  la  machine;  cette 
demi^re  sera  alors  en  mesure  de  "repondre”  aux  questions  de  ses  clients,  posees  gr£ce  k 
une  logique  d’  interrogation  dderite  ultdrieurement. 

L' importance  du  rdsumd  est  done  tr^s  grande,  car  c' est  sa  qualitd  que  dependront  les 
resultats  d*  une  recherche. 

Actuellement,  la  plupart  des  publications  sclentifiques  et  techniques  s^rieuses  (reports, 
theses,  articles  de  revues,  etc.  )  ont  un  rdsum^.  O’ est,  de  plus,  une  r^gle  i  1' intdrieur 
de  la  Compagnie  IBM.  En  tout  dtat  de  cause,  notre  systeme  ne  tient  pas  compte  des  documents 
qui  n’ont  pas  de  r^sum^. 

L’  auteur  d’  un  document  est  cens^  €tre  la  personne  la  plus  qualifies  pour  en  faire  le 
r^sumd,  et  de  ce  fait  le  rendre  le  plus  "Informatif”  possible.  Le  resume'  coaporte: 

-  litre,  date 

-  Auteur(s),  crlgine  du  document 

-  Numdro  propre  du  document,  identifications  di verses 

-  R^sum^  proprement  dlt,  de  10  A  30  lignes  (100  i  300  mots),  suivi  de  1’ indication  du 
nombre  de  pages 

-  Un  certain  nombre  de  descripteurs  (2  a  5),  dont  un  num^ro  '  ),  cat^gorie,  destines  a 
la  pidparation  de  catalogues  par  sujet,  par  cat^gorie  (Liste  des  23  categories  en 
Annexe  2) 

-  Un  numdro  d’ acc^s  sdquentiel  chronologique,  purement  artificlel,  servant  i  identifier 

le  resume  en  ordinateur,  ainsi  quf  le  support  microfilm  od  se  trouve  le  document  complet. 

Les  resumes  sont  alors  mis  en  memoire  de  1’  ordinateur  par  1’  intermediaire  des  supports 
"carte  perforde”  et  "bande  magnetique”  au  cours  des  operations  suivantes: 

-  Perforation  des  rdsumes  (en  moyenne  20  cartes  par  resumd),  suivant  des  regies  impesdes 
par  les  divers  prograsuues  d’ exploitation  du  systdme  de  Recherche  Automatique  de 
Documentation 

-  Mise  sur  bande  et  verification  simultsnde  dc.s  fonctions  suivantes, 

Validite  des  car"cteres 

Numdro  de  sdquence  en  ordre  numdrique  croissant 

Composition  du  texte  clair:  prdsence  de  1'  ensemble  des  composantes  du  rdsumd 
Orthogr^he:  1’  ensemble  des  mots  nouveaux  est  compard  aver  une  "bande  magndtique 
dictionnaire”,  contenant  environ  1  100  mots  correctement  epelds. 

Lorsqu’  une  anomalie  est  ddteetde  -  mot  mal  orthographid,  erreur  de  perforation,  de 
sdquence,  mot  nouveau,  etc  -  elle  apparaft  sur  une  Imprimante  afin  de  permettre  aux 
spdcialistes  de  la  corrlger;  un  mot  nouveau  est  par  exemple  ajoutd  au  dictionnaire  sur 
bande,  une  erreur  est  corrigde,  etc. 

Lorsque  1’  ensemble  de  ces  ignetions  a  dtd  vdrifie,  des  bulleti.is  (rdsumds  complets)  et 
catalogues  par  sujet,  catdgorie,  auteur,  origine  et  numdro  d’ acedo  sont  prodults  par 
r  ordinateur,  et  envoyds  aux  mdmes  destinatalres  que  les  microfiches  contenant  les  documents 
complets  (Fig. 1). 
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Enfin,  ultlne  ^tape,  une  s^rle  de  bandes  magndtiques  correspondant  aux  rdsum^s  des 
nouveaux  documents  est  prdpar^e  par  type  de  document.  Une  bande  magn^tlque  k  haute  densltd 
(800  BPI)  contient  environ  6  000  rdsumds  de  documents. 

Le  Centre  de  La  Gaude  revolt  alors  des  Etats-Unls  une  cople  de  ces  bandes  qui  viendront 
s’ajouter  aux  bandes  ‘franqalses’ .  Ces  nouveaux  documentf;  sont  alors  fuslonnds,  des  deux 
cOtds  de  1’ Atlantlque,  avec  les  flchlers  *histuriques’  et  constituent  la  ^emoire  documenta¬ 
tion’  de  1’ ordlnateur. 

N.B.  -  Les  mlses  A  Jour  sur  bande  sont  trtuiimiises  des  USA  k  La  Gaude  par  les  unltds  IBM  7702 
de  transmission  de  bande  magn^tlque,  relives  entre  elles  par  simple  ligne  tdl^phonique. 


4.  CARACTERISTI4UGS  DU  SYSTEME 
4. 1  Entrde  Simpllfl^e 

-  La  pensde  de  1’  auteur,  en  langue  orlginale  et  en  langage  clair,  est  directement  mise  sur 
support  magn^tique  avec  un  minimum  d’ intervention  humalne. 

-  La  recherche  s*  effectue  sur  le  texte  normal  de  1’  ensemble  des  dldments  du  r^sumd. 

-  Aucune  codification,  aucune  classification,  aucun  mot-cld  ne  sont  ddsormais  ndcessalres; 
les  rdsumds  se  trouvent  sur  bande  magndtique,  en  ordre  c  .  lologique,  quelle  que  solt 
leur  nature. 

4.  Z  Logiwe  de  Recherche 

Souple  et  efflcace,  elle  pennet,  dgalement  en  langage  clair,  de  consulter  les  fichlers 
documental res  sur  bandes. 

-  Cette  loglque  est  fondde  sur  la  satisfaction  de  CONCEPTS,  dont  le  nombre  et  la  nature 
ddcrivent  le  probldme  posd. 

-  Le  langage- quest ion  utilise  un  certain  nombre  d’ Opdrateurs  Logiques,  permettant  de 
ddfinir  une  question,  aussi  complexe  soit-elle., 

-  Les  questions  peuvent  dtre  posdes,  en  clair,  dans  la  langue  du  document  en  mdmolre. 
Actuellement,  franqals  et  anglais  sont  utillsds  conjointement  dans  ce  but.  Cette 
possibllltd  peut  6tre  dtendue  k  toute  langue. 

4.  3  Rapid! td  de  Traltement 

L’  ordlnateur  permet  de  poser  une  moyenne  de  100  questions  simultan^ent,  de  lire  120  000 
mots  de  texte  par  minute  et  d’  imprlmer  les  rdsultats  sur  des  Imprimantes  rapldes. 

4.4  Format  des  Rdponses 

II  est  identlque  k  celul  de  1’  entrde,  c’  est-A-dire  qu’  il  conslste  en  un  rdsumd,  en 
langage  clair,  avec  indication  de  1’ adresse  du  document  complet  (support  microfiche). 


S.  L0GI4UE  DE  RECHERCHE  (FIG. 2) 

La  sdlectlon  d’un  rdsumd  de  document  est  fondde  sur  la  satisfaction  d’un  "crltdre  de 
sdlection”.  Ce  critdre  exige  la  prdsence  d’un  ou  de  plusieurs  concepts  slmultands  i 
r  Intdrleur  du  rdsumd. 

Appelons  (XmcXPT  ELEMEKTAIRE  un  MOT  Isold  du  langage  naturel  (en  n’ importe  quelle  langue). 
Un  C!0NCEPT  sera  la  rdunion  de  concepts  dl^entaires,  rellds  entre  eux  par  un  certain  nombre 
d’  OPERATEURS  LOGIQUES. 
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6.  DIFFERENTS  CONCEPTS.  POSSIBLES 

6.1  (a)  Le  concept  le  plus  simple  est  le  Concept  El^mentaire.  II  est  possible,  par 
example,  d’  exlger  la  seule  presence  d’  un  mot  dans  un  r^sum^  pour  assurer  sa 
selection. 

Exemple:  LASER 

N.B.  -  II  faut  ^re  prudent,  car  un  terme  trop  g^ndral  (transistor,  ordinateur), 
pour  un  entire  de  selection  dgal  A  UN  (1),  pent  engendrer  une  "sortie”  trop 
Import ante. 

(b)  La  technique  du  MASQUE  permet  de  rechercher  sur  la  racine  d’  un  Concept  El^entaire 
(mot),  afln  de  couvrir  les  dlff^rentes  formes  grammaticales  possibles,  en  ouvrant 
un  dventail  de  caract^res  apr^s  la  racine.  Deux  cas  peuvent  se  presenter: 

Masque  selectif:  indlqu^  par  autant  de  signes  jS  que  de  caract^res  a  masquer,  il 
permet  d'ouvlr  un  ^ventall  limits  4  1,2,  3, 4  ou  5  caract4res  apr4s  la  racine  du  met. 

Exemple:  TRANSISTOR^  couvrira  le  mot  au  singulier  comme  au  pluriel,  HOPITAlS^ 
couvrlra  les  mots  HSpital  ou  Hdpitaux. 

Ce  masque  selectif  est  utilise  lorsque  I’on  veut  dvlter  la  r^p^tition  d’un  mot 
sous  ses  formes  diff^rentes,  ces  formes  et  le  nombre  maximal  de  caract4res 
possible  apr4s  la  racine  4tant  connus. 

Masque  illimite:  Indiqudparle  signe  il  permet  d’ ouvrir  un  4ventail  illimlt^ 
de  caract4res  apr4s  la  racine  du  mot. 

JSxempZe.- DOCUMENT$,  couvrira  des  mots  tels  que  Document,  Documents,  Documentation, 
Documentallstes,  etc. 

Il  faut  ^re  prudent  dans  1’  emploi  du  masque  illimlt^,  car  la  selection  peut  se 
faire  sur  des  mots  dont  la  racine  est  commune,  mais  dont  la  terminaison  peut  6tre 
telle  que  le  mot  n’  a  plus  :.ucun  sens  avec  le  mot  de  la  question. 

Par  exemple,  si  1’  on  recherche  des  mots  tels  que  ORAL  ou  ORAUX,  ou  0RALE3IEKT,  il 
faut  dvlter  d’utiliser  0RA$,,  car  des  mots  comme  ORAGE,  ORAISON,  pourront  ^re 
considdrds,  aussl,  comme  bonne  r^ponse _ 

6. Z  Des  concepts  plus  complexes  se  forment  4  partir  de  la  reunion  de  Concepts  El^mentaires 
relids  par  des  Opdrateurs  Logiques. 

Operateurs  Logiques: 

(a)  OU  Cet  opdrateur  permet  d’ exprimer  diffdrentes  possibilitds,  c'est-4-dlre 

diffdrents  Concepts  Eldmentaires,  sur  le  m6me  niveau  logique. 

w  OU  X  OU  y  OU  z 

Exemple  de  CONCEPT  possible;  hdpitajSiS  OU  clinique$ 

(b)  ET  Cet  opdrateur  exlge  la  presence  simultande  des  plusieurs  Concepts  Eldmentaires 

ou  de  plusieurs  CONCEPTS. 

a  ET  b  ET  c 

Exemple  de  CONCEPT  possible;  ordinateurjS  ET  mddicine  ou  encore;  automatisatlon 
ET  (h<SpitajJ$  OU  cliniquejS) 

(c)  AVEC  Cet  opdrateur  exige  la  presence  de  deux  ou  plusieurs  Concepts  El^entaires 

a  1’  intdrieur  d’  une  phrase  de  texte;  une  phrase  est  const.itude  par  1’  ensemble 
des  caraetdres  compris  entre  deux  points. 


a  AVEC  b 

Exemple  de  CONCEPT;  bande$  AV^  magndtlqueS. 


34 

(d)  aDJ  Les  Concepts  El^mentalres  affect^s  par  cet  op^rateur  dolvent  se  trouver  en 

position  adjacente,  et  dans  I’ordre. 

a  ADJ  b  ADJ  c 

(b  doit  suivre  a  et  prrfc^der  c) 

Exentple  de  CCMCEPT:  documentation  ADJ  automatlque  ou  encore:  mdmoirejS  ADJ 
k  ADJ  tambour  ADJ  magnet ique 

(e)  SAUF  Cet  opdrateur  peut  s' appliouer  k  tout  concept  pr^c^dement  d^fini  (dl^mentaire 

ou  non).;  II  perroet  d’exclure  de  la  selection  finale  tout  r^sum^  dans  lequel 
le  concept  non  ddslr^  apparait,  m^e  si,  par  ailleurs,  le  r^sum^  rencontre 
d’  autres  crit^res  de  selection. 

Exemples  de  CONCEPTS  n^gatifs:  SAUP  (fortran  OU  cobol);  SAUP  (centr)8)8)8  ADJ 
tel^phonique^);  SAUP  (simulation  ETT  aerospatial) 

(f)  OUI  Cet  op^rateur  peut  s' appliauer  ^galement  k  tout  concept.  II  n<'  s’ emplole 

Que  dans  deux  cas: 

(1)  La  question  poshes  comporte  d^j&  un  op^rateur  SAUP.  Dans  ce  cas, 
I'op^rateur  OUI  domlne  et  assure  imp^rativement  la  selection,  quel  que  soit 
1’  environnement  s^mantlque. 

(il)  Le'  crit^re  de  selection  exige  deux  concepts  ou  plus.  Dans  ce  cas,  une 
condition  suffisante  de  selection  sera  remplie  si  le  seul  concept  affect^ 
de  1*  opdrateur  OUI  est  present  dans  un  resumd.  La  presence  des  deux  (ou  de 
plusleurs)  concepts  r^clamds  par  ailleurs  n’  est  plus  n^cessaire. 

(g)  INDICATEUBS  de  ZONE  de  RECHEBCHE  Lors  de  la  perforation  des  rrfsam^s,  un  code 

d’  identification  de  chaque  zone  du  r^sum^  (zone-titre,  zone-auteur,  zone- 
orlgine,  zone-r^sumd,  etc.)  fait  partie  de  1’ ensemble  des  caract^res 
constitutifs  de  cette  zone.  Le  but  est  double. 

Assurer  une  mlse  en  page  lors  de  1'  Impression, 

Pemettre  une  rech'rche  localis^e  k  1’ int^rieur  d’une  zone  particuli^xe  du 
rdsum^. 

(i)  Indicateur  imperatif  de  zone  -  S’ applique  d  tout  concept.,  Cet  indicateur 
permet  d’ exlger  la  presence  du  concept  a  1’ int^rieur  d’une  zone  prdfdrentielle, 
indlqu^e  k  1’  avance  lors  de  la  question. 

Par  exemple, 

la  question  “CONTROLE  ZONE  AUTEUR  Einstein”  permettra  de  ne  s^lectionner  que 
les  documents  dont  Einstein  fut  1’  auteur,  et  non  ceux  dans  lesquels  la 
theorie  de  la  relativity  d’ Einstein  est  mentionnye. 

(il)  Indicateur  d'exclusion  de  zone  -  S’ applique  k  tout  concept.  Cet 
indicateur  permet  Inversement  de  poser  une  question  relative  d  tons  les 
termes  d’  un  rdsumy,  sauf  d  ceux  d’  une  zone  non  ddslrye,  choisie  d  1’  avance. 

Par  example, 

la  question  "SAUF  CONTROLE  ZONE  ORIGINE  onde$  ADJ  yiectriquejfi”  permet  de  re- 
trouver  les  documents  traitant  des  ondes  dlectrlques,  sauf  s’  il  s’  agit  des 
articles  de  la  revue  ‘T)nde  Electrique"  (d  moins  que  cet  article  ne  traite 
pryclsyment  d’  ondes  dlectrlques. ) 


7.  CONSTRUCTION  D’ UN  PROFIL- QUESTION 

Cet  ensemble  de  moyens  logiques  permet,  en  dyflnissant  le  nombre  et  la  nature  des  concepts 
requis,  de  convertir  les  intyrdts  du  demandeur  en  langage- quest ion  clair,  informatif  et 
prycis. 


f 
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La  construction  d’  une  bonne  Question  depend  essentiellement 

-  d’  un  cholx  correct  de  la  termlnologie  caract^ristique  du  probl^ne  posd.  Pour  cela, 

11  faut  une  bonne  connalssance  du  sujet.  jointe  a  une  connaissance  appropride  de  la 
nature  de  la  banque  d* Informat Ions  et  de  son  volume  approxlmatif  dans  les  diff brents 
domalnes  sclentiflques  et  techniques  qui  la  composent.  Thesaurus  et  Dlctionnaire  de 
Synonymes  sont  dgalement  des  out  11s  efficaces  pour  ‘couvrir’  le  sujet. 

-  d’une  bonne  utilisation  de  la  loglque  de  recherche,  en  fonct  jn  du  sujet  et  du  r^sultat 
recherchd.  Il  est  Important  de  savoir  si  le  demandeur  veut  obtenir  peu  ou  beaucoup 

de  r^f^rences,  afln  de  rdtr^cir  ou  d* ^largir  la  question  en  consequence.  Le  cholx  des 
opdrateurs  et  de  leurs  liens  reclproques  est  primordial. 

.A  tltre  d'  exemple,  la  comparalson  entre  les  possibilites  offerees  par  1'  utilisation  des 
operateurs  ET,  AVEC  et  AOJ  est  Intdressante. 

Supposons  qu’  un  chercheur  solt  int^resse  par  les  developpements  dans  le  domalne  de  la 
technologle  des  bandes  magnetlques.  Sa  question  peut  se  definir  par  2  concepts, 

(a)  Le  concept  “technologle”  et  ses  synonymes  ne  pr^sente  pas  de  difficult^s:  technologl$$)SjS 
OU  technique^  OU  etc. 

'b)  Le  concept  “bandes  magn^tlques”  peut  ^re  nuanc^  par  le  cholx  que  I'on  fera  de  I'un  des 
op^rateurs  ET,  AVEC,  AOJ. 

(1)  L’ op^rateurs  ET  (bandejS  ET  magn^tiquejS)  sera  choisi  si  I’on  veut  une  recherche  aussi 
large  que  possible,  puisque  nous  exigerons  la  presence  des  Concepts  El^entaires 
'TaandejS",  ‘^agnAlquejS",  n’  Importe  od  dans  le  r^sumd.  Toutefois,  dans  un  tel 
rdsumd,  les  mots  “banded”  et  “magndtiquejS”  peuvent  se  trouver  physiquement  dlolgnds 
et  dans  des  contextes  sdmantlques  diffdrents,  d’od  un  risque  de  "bruit”,  c’est-d>dire 
document  sdlectionnd  mais  hors  du  sujet. 

(11)  Si  le  cholx  de  I’opdrateur  AOJ  est  fait  (bandejS  ADJ  magndtiquejS),  I'ordinateur  ne 
sdlectionuera  un  rdsumd  que  s’  11  comporte  le  mot  "banded”  suivi  du  mot  *^agndtlque)$”. 
Dans  ce  cas,  nous  rddulsons  le  risque  de  ’.’bruits”,  mais  nous  introduisons  celul 
d’  avoir  des  "silences”,  c  est-d-dire  des  rdsumds  en  mdmoire,  mais  non  sdlectionnds 
k  cause  d’  une  loglque  Insufflsamment  adaptde  au  langage  bumain. 

Supposons  qu’  un  rdsumd  contienne  la  phrase  suivante, 

’’.  ..Ces  rdsultats  provlennent  de  1' utilisation  de  nouveaux  materlaux  mognetiques 
rdcemment  adoptds  pour  les  ddrouleurs  de  bandes  IBM. . . ” 

Ce  rdsumd  ne  sera  pas  “retrouvd”  et  notifid,  blen  qr’  il  rdponde  parffiitement  d  la 
question. 

(ill)  Pour  ce  probldme  prdcls,  le  cholx  de  1’  opdrateur  AVEC  (banded  AVEC  magndtlque)i) 
est  souhaftable,  car  cette  loglque  se  situe  d  mi-chemin  entre  le  ET,  trop  gdndral, 
et  le  AOJ,  trop  restrictif.  Dans  ce  cas.  le  rdsumd  mentionnd  ci-dessus  sera 
sdlectlonnd,  puisque  les  Concepts  Eldmentaires  "bandejS”  et  ’)nagndtique$”  se  trouvent 
dans  une  phrase,  entre  deux  points  de  texte. 

La  (rharte  "Loglque  Recherche  360”,  illustre  les  posslbllltds  loglques  ddcrites  plus  haut. 
Elle  appelle  quelques  commentaires: 

-  les  diffdrents  concepts,  identiflds  par  le  symbole  CCmxx,  od  XX  reprdsente  un  simple 
numdro  de  sdrle  et  non  de  priorltd,  se  trouvent  id  sur  le  rndme  niveau  loglque,  car  le 
critdre  de  sdlection  a  dtd  flxd  d  1.  Chacun  des  7  concepts  est  done  inddpendant  vis- 
d-vis  des  autres. 
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-  le  CON03  s’  exprime  par  1’  intermCuiaire  de  sous-concepts,  A3  et  A4.  Chacun  des  sous- 
concepts  A3  ou  A4  est  Insufflsant  par  lui-m6me  pour  assurer  une  selection,  pulsQu’ ll 
est  udcessaire  de  les  r^unlr  par  I’op^rateur  ET. 

.  A4  exprime  des  possiblllt^s  (logl<iue  OU) 

■  A3  permet  de  r^unlr  sur  le  mSme  niveau  loglque,  affect^s  de  I’op^rateur  OU,  des  mots 
Isolds  (Al)  et  des  mots  adjacents  (A2).- 

Ce  concept  montre  toute  la  souplesse  de  cette  logique,  dont  le  complexity,  tout  en  suivant 
des  regies  d’  application  strlctes  qui  ne  seront  pas  dytailiyes  ici,  peut  €tre  adapt^e 
sans  llmites  A  la  caractyrisatlon  du  probUme  pos^.  (Figs.  3  et  4). 


Le  discussion  de  ce  sujet  continue  page  39. 
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f  • 

Aaa*M 

A»« 

CATHODE  adj  RAY  adj  TUBES 
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AaaTS* 

A»» 

A**  or  A’*  or  A** 

a« 

ConOi 

A*  and  A'« 

af 

AaaTS* 

A«* 

A’*  or  A** 

aa 

AasTs* 

Con07 

A’*  and  A»» 

aa 

AaaTS* 

ConOS 

ITIRC 

a* 

Con09 

DUBCR  with  RJ 

aa 

AaaTS* 

Conic 

MAGHINO  with  JJ 

aa 

AaaTS* 

Conit 

MERRITT  with  CA 

ST 

AaaTS* 

Con12 

GARUND  with  J 

ta 

AaaTS* 

Coni  3 

JACKSON  with  EB 

a* 

End 

a* 

I 


Pig. 4  Exemplr  de  profil 


39 


DISCUSSION 


A. H.  Holloway:  You  have 
language  of  the  enquiry 
must  be  intervention  by 
seen  to  be  essential  to 


described  your  system  as  using  natural  language,  but  if  the  natural 
does  not  correspond  with  the  natural  language  of  the  document  there 
a  human  agent.  The  dialogue  between  the  user  and  the  system  would 
the  performance  of  any  system. 


R. J.Oubon:  It  is  true  that  up  to  the  present  time  we  have  not  been  able  to  eliminate  the 
link  of  the  Information  specialist  between  the  enquirer  and  the  system. 


E. Keonjian:  What  are  the  maintenance  problems  of  your  systems,  including  the  error 
detection  in  the  system,  and  the  qualifications  of  the  personnel  required  to  operate  it? 

R.J.Dubon:  Keypunching  is  checked  so  that  we  know  our  tapes  are  correct.  Logic  errors 
can  be  detected  when  preparing  the  search  program.  Staff,  excluding  those  preparing  the 
input,  consists  of  two  engineers,  a  clerk  and  a  secretary. 


R.C.  Wright;  How  many  persons  are  engaged  on  input  processing  to  the  system? 

R.J.Dubon:  Fifteen  persons  are  engaged  on  keypunching,  proof  reading,  merging  and 
dispatching  the  input.  No  abstracting  is  done;  the  abstract  accompanying  the  article  is 
used. 


R.D. Kerr- Waller:  We  have  found  that  the  use  of  Boolean  logic  in  a  search  produces  false 
drops.  A  change  to  a  system  of  weighted  keywords  resulted  in  a  considerable  reduction  in 
the  number  of  false  drops. 

R.J.Dubon;  We  also  have  noted  errors  when  using  Boolean  logic  but  these  have  not  been 
serious.  Use  of  a  "with”  logic  has  given  good  results. 

R.  Brce:  1.  What  is  the  total  input  to  your  system? 

2.  What  influence  does  the  use  of  several  languages  have  on  the  economy  of  the  system? 
R.J.Dubon:,  1.  Input  is  3,000-4,000  items  per  month. 

2.  Foreign  languages  would  not  affect  the  economy  of  the  system  as  the  system  operates  in 
the  English  language  and  questions  are  tra.islated  into  Etaglish. 


E.Lapeysen:  Is  there  any  screening  of  the  output  from  a  question  put  to  the  system? 


R.J.Dubon:  No;  the  output  is  sent  direct  to  the  enquirer. 


* 


PAPER  4 

POUR  “NEW”  SCIENCES:  AN  APPROACH  TO  COMPLEXITY* 

by 


E. B. Montgomery 


Dean.  School  of  Library  Science, 
Syracuse  University,  New  York,  USA 


*  De:.;'.  Montgosery  was  cot  able  to  deliver  this  paper  at  the  Synposlun  but  it  is  included  to 
complete  the  Proceedings.  In  Dean  Montgomery’s  absence.  Dr.  E.L.Elchhorn  of  Jet  Propulsion 
Laboratory.  USA  spoke  on  Integrated  Data  Management  in  the  Deep  Space  Net.  As  this  was  a 
shortened  version  of  a  paper  intended  for  full  presentation  elsewhere,  it  is  not  thought 
J  appropriate  to  include  it  in  these  Proceedings. 
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SUMMARY 


Present  -  day  discoveries  are  causing  an  exponential  growth  in  information 
because  a  discovery  in  one  field  may  very  well  lead  to  new  discoveries  and 
the  generation  of  new  information  in  ocher,  related  fields  and  even  in  other 
disciplines.  The  problem  is  one  of  coping  witn  complexity. 

Pour  “new"  sciences,  not  new  in  themselves,  but  looked  at  from  new  points 
of  view  are  suggested  as  solutions,  namely  Information  science,  Communication 
science.  System  science  and  Application  science. 

Finally,  it  is  suggested  that  the  Science  of  science  itself  may  provide 
answers  to  some  of  the  problems. 
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FOUR  "NEW**  SCIENCES:  AN  APPROACH  TO  COMPLEXITY 
E. B. Montgomery 


The  problem  to  be  faced  in  scientific  and  technical  Information  involves  far  more  than 
the  solution  of  information  storage  and  retrieval  problems  of  our  technical  culture.  We 
are  currently  faced  with  the  complexity  that  results  from  the  exponential  increase  in 
knowledge.  We  have  not  learned  to  cope  with  it.  And  yet,  these  Increases  force  us  to 
expand  our  search  for  more  knowledge. 

Research  and  development  in  area  after  area  are  becoming  mass  produced  and  even  automated. 
This  trend  will  also  Increase.  These  increases  are  exponential  by  nature.  Much,  if  not 
most,  of  the  new  information  created  or  discovered  has  implications  for  and  interrelation¬ 
ships  with  other  knowledge  already  discovered  as  well  as  that  which  will  be  created  in 
the  future.  New  information  frequently  affects  information  in  other  areas  and  even  in 
other  disciplines.  For  example,  the  unravelling  of  the  genetic  coding  in  the  RNA  molecule 
has  major  implications  for  many  disciplines.  Radio  carbon  dating  has  changed  many  sequences 
of  history.  In  turn,  this  changed  information  gives  rise  to  more  changes  in  other  disci¬ 
plines. 

It  is  not  a  small,  simple  exponential  but  a  complex  phenomenon  of  chain  reactions.  Some 
of  them  die  out  quickly:  however,  others  continue  branching  into  many  disciplines  having 
large  reproduction  factors  with  relatively  short  reproduction  cycles,  ranging  from  a  few 
months  to  a  few  years. 

New  approaches  are  needed  if  this  complexity  is  to  result  in  progress  rather  than 
confusion  and  chaos  —  if  in  other  words,  we  are  to  anticipate  and  and  protect  ourselves 
from  the  by-products  of  increasing  knowledge. 

This  paper  suggests  an  approach  to  the  solution  of  part  of  the  problem.  The  approach 
calls  for  the  organization  and  synthesis  of  "new  sciences”  that  will  allow  better 
comprehension  of  the  interrelationships  of  knowledge  which  can  leaJ  to  such  problems  as 
pollution  and  Inundation  of  information  and  overpopulation.  Interrelationships  exist  in 
many  dimensions  and  play  many  roles.  The  more  points  of  view  we  have,  the  greater  chance 
we  have  to  understand  those  interrelationships  of  which  we  are  already  aware  and  to 
discover  new  and  unsuspected  relationships.. 

The  four  areas  which  should  be  developed  into  new  disciplines  of  sci“nce  are:.  Information, 
communications,  systems  and  applications. 

No  one  of  these  is  entirely  new.  They  are  being  pursued  in  varying  degrees  at  present 
and  in  various  combinations.  The  establishment  of  societies  for  cybernetics,  general 
systems,  etc. ,  attest  to  this  fact.  These  efforts,  however,  are  not  nearly  strong  enough 
to  provide  what  is  needed  in  the  face  of  the  present  growth  of  complexity. 

Therefore,  it  is  recommended  that  these  sciences,  accompanied  by  the  obvious  parallel 
engineering  fields,  be  explored,  struci^urcd  and  pursued  with  increased  breadth  and  support. 

At  present,  interrelationships  sought  through  research  and  study  are  somewhat  unilateral 
and  are  usually  confined  to  the  discipline  giving  rise  to  them.  Even  so,  there  is  more 
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realizatljn  of  their  potential  outside  their  parent  discipline  than  ever  before.  But  this 
is  not  nearly  enough.  More  awareness  of  implications  and  predictability  is  needed. 

If  we  postulate  the  four  logical  scientific  '’omains  of  knowledge  that  provide  a  continuum 
from  information  to  its  ultimate  use,  and  further  require  that  these  domains  cross  all 
disciplines  at  right  angles  to  them,  we  should  add  the  now  missing  dimension  to  our 
ability  to  understand  the  potential  Interrelationships  of  knowledge. 

The  first  science  is  information  science.  In  order  to  define  it,  we  should  consider 
the  broadest  possible  definition  of  information.  Whether  or  not  the  definition  agrees 
with  that  in  the  dictionary  is  of  little  importance.  Information  exists  everywhere  in 
the  universe  around  us.  We  couW  define  information  as  the  position  of  all  the  atoms  and 
molecules  in  the  universe  and  of  all  sets  and  combinations  of  those  atoms  and  molecules 
at  any  time. 

Let  us  next  consider  how  information  operates  --  both  as  a  result  of  man’s  Imowledge 
and  reflection  and  as  the  processes  constantly  occurring  in  nature.  Of  course,  we  are 
limited  by  far  too  little  knowledge  of  both  the  thought  and  activities  of  man  and  of  the 
many  facets  of  nature.  We  know  that  wo  must  continue  our  investigations  which  will  lead 
to  an  Increased  knowledge  about  Information. 

Information  exists  within  all  of  the  organisms  of  nature.  In  fact,  every  cell  of 
every  organism  has  molecular  information  that  defines  and  determines  the  complete  descrip¬ 
tion,  make-up  and  functional  specification  of  that  organism.  When  organisms  reproduce, 
information  about  the  one  or  two  organisms  giving  rise  to  the  reproduced  organism,  is 
transmitted  in  the  reproduction  process  and  a  new  piece  of  molecular  Information  is  created 
in  the  process. 

On  the  highest  level  of  complexity,  man  is  acted  on  not  only  by  his  environment  and 
his  reaction  to  it,  but  also  by  his  own  feelings  and  thoughts. 

As  a  result  of  these  complex  Interactions,  organisms  have  information  which  is  trans¬ 
mitted  frequently,  if  not  constantly,  about  the  state  of  the  organism  and  of  the  sub¬ 
organisms  or  systems  within  the  organism.  For  example  the  nervous  system  is  in  constant 
surveillance  of  the  sensing  by  the  human  body  of  the  change  of  state  of  the  organs,  of 
the  change  of  state  of  emotions  and  of  the  change  of  state  of  thinking. 

Information  is  also  produced  by  the  activities  of  organisms.  Humans  observe,  design 
experiments,  create,  think,  feed  and  indulge  in  various  activities  which  increase  the 
amount  of  information.  In  a  scholarly  discipline,  such  as  a  science,  this  increase  in 
information  is  sought  after  in  an  organized  and  logical  fashion. 

However,  information  itself  is  not  sufficiently  investigated  as  a  scientific  entity. 

The  definition  of  information,  its  roles,  and  its  contribution  to  the  disciplines  are 
more  frequently  by-products  of  disciplines  with  other  goals  than  an  understanding  of 
information  per  se.  Information,  its  dynamics,  the  tools  that  elicit  it  and  the  tools 
with  which  we  use  it  deserve  a  special  science  that  can  lead  us,  through  better  under¬ 
standing,  to  a  far  better  use. 

Information  in  itself,  however,  is  of  little  use.  It  must  be  communicated.  We  must 
search  scientifically  for  answers  to  such  questions  as:  What  is  communication?  How  does 
it  take  place?  Why  does  it  occur?  ?(ha.,  are  its  purposes?  and  '#heL  forms  of  communication 
are  there’  There  is  a  need  for  a  comprehensive  science  dedicated  to  the  uiiderstanding  of 
communication  within  and  among  and  between  organisms. 

Information  communicated  through  a  system  usually  has  a  purpose  or  application.  Thus, 
a  science  of  applications  following  a  science  of  systems  is  needed  to  provide  full  coverage 
of  the  information  continuum 
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Complete  understanding,  then,  requires  a  knowledge  of  what  information  is,  how  it  is 
communicated,  the  systems  used  in  transmittal,  and,  ultimately,  its  application.  This 
complex  is,  of  course,  referred  to  by  some  disciplines  as  "cybernetic  systems.” 

External  information  and  communication  actions  of  organisms  include  communication  on  a 
personal  basis,  on  societal,  cultutal,  business,  international  and  many  other  bases.  Among 
these,  too,  are  the  needs  for  a  better  understanding  of  what  communication  is,  what  a 
system  is  and  why  it  operates  or  is  applied. 

The  four  sciences  should  not  be  limited  to  the  interface  with  participation  in  the 
physical,  biological  and  social  sciences.  These  sciences  of  information  should  transcend 
and  tie  together  all  activities  of  man.  including  all  of  the  disciplines,  from  art  and 
architecture  to  zoology. 

We  do  need  to  know  what  communication  is  and  what  a  system  is  from  a  sufficiently  broad 
point  of  view  so  that  we  understand  their  general  properties  and  dynamics  and  can  antici¬ 
pate  their  by-products. 

There  are  many  research  and  development  activities  in  communications  and  systems  as  well 
as  in  information  science.  Most  of  these  have  been  organized  to  solve  particular  problems 
rather  than  to  seek  a  broad  understanding  of  the  dynamics  and  theory  of  information, 
communications  and  systems  sciences  (this  is  at  present  more  the  object  of  the  societies 
that  have  been  formed).  This  work  has  been  of  tremendous  value  and  it  is  not  intended  to 
detract  from  it. 

The  fourth  science  dealing  with  applications  should  be  more  thoroughly  explained.  For 
example,  how  does  man  apply  the  knowledge  of  the  information  that  he  has?  Most  of  the 
endeavors  in  our  society  are  those  that  pertain  to  the  use  of  knowledge:  yet  one  of  the 
least  perfect  areas  of  «cientific  endeavor  might  be  said  to  be  that  of  applications. 

A  science  of  applications  would  embrace  studies  leading  to  an  understanding  of  how  man 
really  uses  information.  This  could  lead  to  vastly  improved  functions  of  all  sorts.  For 
example,  education  is  the  application  -••  through  a  system  of  communication  --  of  the 
knowledge  and  the  information  man  has.  The  end  product  of  education  is  the  application 
of  knowledge  to  all  phases  of  learning  —  further  scholarship,  a  professional  career  or 
simply  a  fuller,  better  life.  Yet,  the  entire  process,  is  not  very  well  understood.  What 

learning  is  needs  to  be  much  better  understood  from  the  standpoint  of  the  application  of 

information.  The  best  ways  of  communicating  information  are  not  known.  Do  we  really 
know  what  learning  is?  The  systems  called  schools,  although  they  are  improving,  are  far 

from  achieving  the  best  educational  objectives  because  of  our  lack  of  knowledge  of  appli¬ 

cation. 

Applications  knowledge  impinges  on  the  arts,  and  while  the  present  state  of  the  arts  may 
seem  productive  and  satisfactory,  one  facet  of  art  is  in  the  ability  to  apply.  However, 
what  is  applied  and  how  it  is  applied  are  not  very  well  understood. 

The  list  of  examples  of  the  need  to  study  applications  should  also  include  questions 
of  how  man  applies  his  knowledge  of  political,  economic,  and  social  sciences  to  the  relation¬ 
ships  among  nations.  Here  again,  precision  of  application  knowledge,  in  fact  what  knowledge 
to  apply,  based  on  what  Information,  is  an  enormous  problem  which  man  has  failed  to  solve 
over  the  centuries.  Now  thet  some  sciences  have  succeeded  in  applying  their  knowledge  to 
the  creation  of  devices  with  the  capability  to  eliminate  man  from  the  face  of  the  earth, 
our  inability  to  discover,  to  communicate,  to  construct  fail-safe  systems  of  society  lead 
to  the  realization  that  our  application  cf  man’ s  knowledge  needs  even  greater  depth  and 
surer  progress. 

A  fifth  science,  which  is  not  a  new  science  at  all,  might  be  called  the  science  of 
science  itself.  It  is  time  for  science  to  be  understood  so  broadiy  that  its  contributions 
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can  be  kept  from  being  used  for  purposes  that  could  harm  mankind.  Science  should  provide 
more  protection  from  the  possible  deleterious  effects  of  its  increasingly  powerful 
products.  Sciontistd  and  technologists  should  provide  fail-safe  capabilities  for  their 
creations,  the  ability  to  anticipate  undesirable  by-products,  and  leadership  in  the 
elimination  of  those  problems  of  the  world  for  which  man  has  been  responsible. 

This  may  seem  naive  and  impractical  to  some.  However,  continuation  of  the  present 
exploitation  of  science  and  technology  is  truly  naive. 

Now,  why  is  it  that  we  speak  of  new  sciences  and  of  new  engineering  areas?  We  are  aware 
that  most  of  the  activities  we  have  discussed  are  subsumed  under  existing  disciplines  such 
as  linguistics,  information  theory,  systems  science  and  other  current  activities  of 
scientific  and  technological  groups  seeking  to  unite  their  efforts  to  promote  more  general 
studies  of  these  subjects. 

We  speak  about  them  as  new  sciences  because  information  has  dynamic  properties.  That 
the  pen  is  mightier  tha .  the  sword  is  an  old  manifestation  of  the  power  of  information. 

Yet  relatively  little  effort  has  been  made  to  really  understand  infoniation  dynamics  in 
terms  of  our  present  needs.  It  is  hopeful  and  promising  to  see  .societies  for  cybernetics, 
general  systems.  Information  science,  and  so  on,  becoming  concerned  with  some  of  the  aspects 
of  the  problem.  However,  it  is  quite  probable  that  the  problem  is  now  growing  at  a  rate 
beyond  the  capabilities  of  these  professional  societies.  The  dynamics  of  information,  its 
communication,  and  its  ultimate  application  in  the  various  systems  that  exist  or  that  have 
been  created  by  man  are  the  dynamics  of  chain  reactions. 

It  is  time  to  create  a  synthesis  that  can  include  all  uses  of  information.  We  must  plan 
for  the  most  effective  understanding  of  the  good  and  the  bad  effects  so  that  we  can  begin  to 
cope  with  the  complexity  that  we  are  creating  with  the  increasing  reactivity  of  chain 
reactions. 

Out  of  such  new  organization  of  disciplines  may  come  new  horizons,  new  approaches,  new 
concepts  and  new  capabilities  that  are  needed  for  considered  growth  and  implementation  in 
research  and  development  of  all  fields.  If  we  consider  the  field  of  information  science, 
it  is  quite  probable  that  a  group  working  to  develop  information  in  one  discipline  may  find 
resources,  methods  and  devices  that  will  be  useful  in  information-seeking  in  other  fields. 
Approaches  and  instruments  that  have  been  developed  in  scientific  research  seldom  stay  in 
the  narrow  field  for  which  they  were  created.  The  utility  of  methods  at  first  used  in,  say 
physics,  such  as  spectrography,  nuclear  magnetic  resonance,  calculating,  measuring  of  any 
sort,  after  their  fundamental  principles  were  defined  and  research  performed  on  the  funda¬ 
mental  phenomenon,  find  application  in  chemistry,  the  life  sciences,  and  lead  ultimately  to 
quality  control  of  the  production  of  goods  and  their  control  of  our  society. 

New  f.’’lnciples,  points  of  view,  philosophies,  definitions  and  specifications  with  respect 
to  information  will  be  useful  in  many,  if  not  all,  fields.  The  exploratory  work  on 
informa*  lox;  in  one  field  should  provide  insight,  impetus,  and  working  tools  for  other  fields, 
ihis  should  ''ffpz  lower  costs  in  terms  of  manpower  and  money.  Of  even  more  value  than 
lowered  cos^,  in  t.mes  of  great  need,  such  as  the  present,  will  be  a  shorter  time  lapse  from 
Initiation  to  broad  use. 

Similar  statements  can  be  made  about  communication  science,  systems  science  and  the 
science  of  applications..  Understanding  of  the  systems  that  resulted  from  systems  analysis, 
beginning  with  the  development  of  the  aerospace  field,  has  led  to  new  approaches  in  the 
many  fields,  including  the  behavioural  and  life  sciences.  On  the  other  hand,  as  more 
understanding  of  the  operation  and  communication  of  information  in  the  organs  of  animal 
organisms  is  achieved,  new  insights  into  social  systems  will  evolve.  It  is  even  possible 
that  the  fail-safe  philosophy  developed  for  the  operation  of  nuclear  reactors  could  be 
translated  into  a  method  of  international  coor-ration.  For  example,  if  international 
cooperation  reacies  an  impasse  because  of  a  disagreement  over  territory,  trade  agreements, 
flow  of  materials  or  some  other  national  phenomenon,  agreement  on  a  fail-safe  philosophy 
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might  be  useful.  Instead  of  pushing  on  to  the  brink  of  war,  when  non-agreement  occurs, 
the  relationships  among  particular  nations  will  drop  to  a  lower  level  of  safe  interaction 
from  which  war  is  not  possible.  This  will  gain  time  and  provide  a  meeting  ground  for 
consideration  of  all  possible  alternatives  in  a  more  objective  atmo, sphere. 

The  repertoire  of  illustrations  of  the  possibilities  of  use  of  these  sciences  is  almost 
endless  and  will  be  limited  only  by  the  imagination  and  creativity  of  the  disciplines 
involved. 

Research  and  development  have  reached  a  stage  where  the  unit  cost  of  knowledge  and 
productivity  is  increasing  exponentially.  Productivity  of  science  and  development  is 
also  increasing  proportionately.  An  even  greater  exponential  applies  to  the  need  for 
increased  research.  Information  and  knowledge  are  the  by-products  of  these  increases. 

They,  in  turn,  are  increasing  exponentially.  Each  new  unit  of  knowledge  brings  with  it 
the  possibility  of  interrelationships  with  other  fields.  Many  of  these  change  those 
fields  so  that  they,  in  transmitting  their  changed  states,  create  new  possibilities  of 
interchange  in  still  other  fields.  So  while  we  are  creating  more  and  more  knowledge,  the 
by-products  of  that  knowledge  are  increasing  at  an  exponential  rate.  This  is  probably 
an  exponential  product  of  the  exponential  growths.  The  ability  of  mankind  to  cope  with  the 
complexity  thus  created  seems  to  lag  farther  and  farther  behind  the  creation  of  new  inter¬ 
actions.  For  that  reason,  more  and  even  better  research  is  needed  and  again,  this  will 
create  ever-increasing  cost.  We  cannot  slow  down,  we  must  increase  our  research  efforts 
as  well  as  our  reflection  on  its  results. 

We  need  to  increase  the  ability  to  reflect,  to  think  about,  and  to  create  more  research 
tod  at  a  lowered  unit  cost.  The  mass  production  of  research,  in  fact,  the  achievement  of 
its  automation,  depends  on  new  concepts  and  new  directions. 

Hopefully,  these  new  approaches  will  provide  a  design  for  coping  ‘  .he  problf  ' 
rising  complexity. 
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SUMMARY 


Up-to-date  developments  in  the  fields  of  * ittern  and  character  recognition 
are  described,  and  technical  and  ec-  2i»:ic  r<'' sibilities  for  the  near  future 
are  forecast  by  surveying  present  te-:hnlcai  priticiples,  present  and  potential 
areas  of  application,  and  slgnificar.l  trends. 

It  is  indicated  that  althou^  character  and  pattern  recognition  are  based 
on  the  same  techniques,  their  areas  of  application  are  quite  diverse. 
Character  recognition  techniques  have  been  incorporated  into  conmerclally 
acceptable  computer  peripheral  equipment  in  the  form  of  character  readers, 
whereas  pattern  recognition  work  is  still  in  the  experimental  stage. 
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TRENDS  AND  DEVELOPMENTS  IN  CHARACTER 
AND  PATTERN  RECOGNITION 

L.  A.  Peidelman 


1.  INTRODUCTION 

We,  as  human  beings,  have  constantly  looked  for  ways  to  make  life  easier,  be  more 
productive,  and  extend  our  capabilities.  Such  pursuits  have  resulted  in  the  design  of 
the  computer. 

The  computer'  i5  main  function  is  the  control  and  manipulation  of  data.  Although  this 
function  has  been  carried  out  in  an  efficient  and  expeditious  manner,  the  means  of 
presenting  this  data  to  the  computer  are  slow.  In  most  cas'^^s,  the  comnon  technique  has 
been  to  take  data  from  its  source  and  have  a  keypunch  operator  translate  it  from  human- 
reaiab.''e  to  machine- language  form.  However,  such  translation  has  proved  to  be  too  costly, 
time-consuming,  and  unreliable. 

The  problem  was  then  to  design  equipment  that  would  efficiently  and  economically  provide 
a  direct  Interface  between  the  computer  and  its  data  environment.  A  solution  to  this 
problem  has  been  accomplished  via  the  technologies  commonly  known  a.s  pattern  and  character 
recognition.  Pattern  recognition  denotes  the  automatic  identification  of  all  patterns. 
Character  recognition  specifically  relates  to  the  automatic  identification  of  alphanumeric 
characters  and  symbol  patterns.  Character  recognition,  which  is  technically  included 
under  pattern  recognition,  warrants  special  consideration  since  it  is  implemented  in  a 
commercially  aval.lable  device.  TTie  more  complex  problem  of  automatic  recognition  of 
general  patterns  is  basically  in  the  realm  of  research. 

This  paper  presents  a  description  >■!  the  pattern  and  character  recognition  technologies, 
present  and  potential  application  areas,  and  significant  technical  and  economic  trends. 


2.  DEFINITION  OF  TFrHNIOUES 

Pattern  recognition  can  be  defined  as  a  technique  for  automatic  identification  of  a 
given  figure  or  arrangement  idilch  is  known  to  belong  to  one  of  a  finite  set  of  pattern 
classes.  This  figure  may  relate  to  a  missile  launch  site  on  an  aerial  photograph,  a 
tumor  on  an  X-ray,  or  resistor  and  capacitor  symbols  on  an  electric  circuit  diagram.  The 
automatic  reading  of  patterns  replaces  the  present  method  because  it  eliminates  visual 
inspection  of  each  film  frame.  Not  only  is  this  task  physically  exhausting,  it  is  also 
prone  to  errors. 
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Character  recognition  is  a  technique  for  automatic  identification  of  alihaniuaeric 
chai’acters  or  symbols.  This  technique  has  relatively  clearly  defined  property  character¬ 
istics  as  compared  with  the  general  class  of  patterns.  The  character  recognition  technique 
has  been  used  in  a  device  called  a  character  reader,  which  is  primarily  a  replacement  for 
the  keypunching  and  card  reading  operation.  The  character  reader  permits  printed,  type¬ 
written,  or  handwritten  data  to  be  entered  directly  Into  the  data  processing  system  from 
the  source  document.;  In  practical  operation,  this  direct  conversion  is  not  always 
possible  due  to  uncontrolled  data  preparation  conditions  so  a  retranscriptioo  of  data 
via  typing  is  necessary.  However,  this  typing  operation  has  proven  to  be  faster,  more 
reliable,  and  more  efficient  than  keypunching:  it  also  requires  fewer  hours  of  training. 
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1.  EQUIPMENT  PRINCIPLES 


Itie  same  equipment  principles  may  be  employed,  with  some  modifications,  in  both  areas 
although  character  recognition  is  simpler  from  a  recognition  stand  point.  The  general 
configuration  for  a  pattern  or  character  recognizer  is  shown  in  Figure  1. 

3. 1  Transport 

The  transport  unit  is  used  to  move  the  film  or  paper  form  past  the  scanner.  Since  these 
units  are  mechanical  in  nature,  they  must  position  the  data  to  be  read  and  move  it  at  a 
proper  speed.  The  present  units  have  been  perfected  to  handle  various  sizes,  types, 
weights,  and  thicknesses  of  forms,  but  they  are  still  the  slowest  part  of  the  system. 

3. 2  Scanner 

The  function  of  the  scanner  is  to  convert  the  pattern  appearing  on  the  film  or  paper 
form  into  some  analog  or  digital  representation.  There  are  two  basic  types  of  scanners: 
magnetic  and  optical.  Magnetic  scanners,  which  apply  only  to  character  readers,  use  a 
magnetic  read  head  to  sense  variations  in  magnetic  flux  produced  by  the  difference  between 
the  magnetic  mark  and  its  background.  Optical  scanners,  on  the  other  hand,  employ  a  light 
source  to  detect  contrasts  between  the  pattern  and  its  background. 

Different  optical  scanner  types  being  e3g>loyed  Include  rotating  mechanical  discs,  flying 
spot  scanners,  parallel  photocell  or  “retina"  photocell  sampling  techniques,  or  vidicon 
television  camera  tubes.  At  present,  the  flying  spot  scanner  is  the  most  commonly  used 
optical  scanner  for  pattern  and  character  recognition  due  to  its  ability  to  adjust  the  scan 
pattern  and  its  hi«d>  resolution.  Within  character  recognition,  the  retina  photocell 
arrangement  permits  the  fastest  sampling  of  characters.  The  vidicon  tube  scanner,  used 
primarily  for  character  recognition,  is  inherently  the  fastest  scanning  technique,  but  it 
can  read  only  a  limited  number  of  characters  on  a  form  (approximately  45  characters  per 
form)  due  to  the  tube  resolution. 

3. 3  Recognition 

The  recognition  unit,  which  is.  the  heart  of  the  system,  has  the  function  of  extracting 
significant  properties  from  the  pattern  and  identifying  them  according  to  class.  In  early 
character  recognition  work,  this  recognition  function  was  implemented  in  a  special-purpose 
hardwired  device.  At  present,  both  character  and  pattern  recognition  rely  basically  on  a 
computer  or  computer-type  device,  with  some  programmable  control,  for  the  recognition 
function. 

3.3.1  Character  Recognition 

Property  definition  for  character  recognition  relates  to  the  formation  differences 
within  the  given  character  set  to  be  read.  This  character  set  is  defined  by  thv'!  font  or 
style  of  the  characters  and  is  determined  by  whether  alphanumerlcs  or  numerics  only  are 
in  the  set.  Identification  of  a  character  is  accomplished  by  matching  patterns  from  the 
scanner  against  reference  patterns  for  each  character. 

Tlie  font  most  widely  used  in  the  United  States  and  adopted  as  a  standard  by  the 
American  Bankers  Association  is  E-13B  (see  Figure  2),  which  can  be  used  to  represent  only 
]0  numerics  and  four  special  symbols.  Another  font  (see  Figure  3),  developed  by  Compagnie 
des  Machines  BULL  -  General  Electric,  is  capable  of  representing  all  the  characters  in  the 
alphabet  as  well  as  all  the  numeric  symbols;  and  has  been  adopted  as  a  standard  by  the 
European  banking  community. 

The  significant  property  differences  among  the  characters  in  the  E-13B  font  (see  Figure  2) 
are  defined  by  the  voltage  warefcrm  produced  by  a  liue-hy-line  scan.  Identification  is 
accomplished  by  matching  against  reference  waveforms.  The  BOLL  font  shown  in  Figure  3 
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differentiates  characters  by  width  variations  between  each  seven-stroke  character.  A 
character  is  identified  by  coffiparing  sequence  and  number  of  narrow  and  wide  gaps  with 
stored  codes  for  each  of  the  alphanumeric  characters. 

Optical  readers  fall  into  three  classes:  character,  mark  sense,  and  bar  code. 

Optical  character  readers  (OCR)  recognize  the  actual  character  by  directly  reading  its 
outline  or  shape.  The  present  OCR  readers  are  capable  of  reading  a  large  variety  of 
character  fonts  that  may  be  printed,  typewritten,  or  hand  lettered.  The  sophistication  of 
devices  varies;  some  read  only  a  single  font  while  others  can  read  multiple  fonts.  An 
attempt  towards  standardizing  fonts  has  produced  the  USASI  (see  Figure  4)  and  ISO-B  (see 
Figure  5)  type  fonts  adopted  by  the  United  States  and  certain  European  countries, 
respectively. 

The  identification  of  such  characters  has  been  implemented  basically  by  a  matrix  matching 
technique  where  the  scanned  elements  of  the  characters  are  matched  against  references  by 
means  of  resistor  matrices.  Another  prcusinent  technique,  called  stroke  analysis,  differen¬ 
tiates  characters  by  the  position  or  frequency  of  vertical  and/or  horizontal  strokes.  The 
character  pattern  is  then  matched  against  a  truth  table  indicating  stroke  formations  for 
each  reference  character.  Curve  tracing,  a  newer  technique  employed  for  handprinted 
recognition,  follows  the  character  outline  indicating  certain  features  such  as  character 
splits,  line  intersections,  line  magnitudes,  and  line  straightness. 

Mark-sense  readers  sense  the  physical  position  cr  location  of  marks  on  a  document 
correlating  the  mark  position  to  a  previously  defined  equivalent  character.  This  technique 
requires  preprinted  forms.  Most  present  0(^  readers  provide  mark-sensing  as  an  option. 

Bar-code  readers  utilize  thick  line  or  bar  representations  of  characters.  Each  character 
is  defined  by  a  given  number  of  long  and  short  bars,  and  can  be  Identified  by  matching  the 
code  against  a  character  reference  table. 

3.3.2  Pattern  Recognition 

Pattern  property  definition  and  Identification  techniques  can  be  described  only  in 
general  terms  since  they  are  highly  dependent  upon  the  specific  characteristics  of  the 
patterns  to  be  recognized.  Property  definition  relates  to  the  separation  of  patterns  into 
classes  or  categories.  This  separation  is  attempted  by  determining  the  relative  invariance 
of  patterns  in  terms  of  significant,  relevant,  and  interdependent  features  and  their  changes 
in  the  time  domain.  Such  pattern  features  include  size,  location  on  film  or  paper,  curves, 
slopes,  symmetry,  and  grey  levels.; 

The  recognition  techniques  must  Identify  the  specific  pattern  in  terms  of  a  ur,  que 
combination  of  pattera  features  which  have  been  changed  due  to  noise.  'The  techniques 
used  are  quite  numerous  with  no  one  technique  gaining  complete  general  acceptance. 

A  significant  amount  of  research  has  been  devoted  to  the  development  of  "learning 
machines”  involving  continual  adjustment  of  the  recognition  logic  to  new  combinations  of 
patterns  or  new  probabilities  of  a  pattern’s  occurrence.  These  techniques  basically  use 
a  statistical  approach  combined  with  property  weighting  schemes  fo;  identification. 

Another  fundamental  technique  is  to  construct  a  decision  tree  wit’  node  relating  to 
one  property.  The  specific  pattern  is  identified  by  a  process  of  elimination.  Curve 
tracing,  ob  employed  for  character  recognition,  is  a  third  technique;  and  many  others 
involving  topographical  considerations  are  presently  being  considered. 


4.  PRESENT  AND  POTENTIAL  APPLICATION  AREAS 

The  application  areas  for  character  and  pattern  recognition  are  diverse.  Character 
recognition  work  is  business-oriented,  and  directed  toward  supplying  data  more  efficiently 


to  the  computer.  Pdttem  recognition  Is  sclence-orlenLed,  and  directed  toward  the  analysis 
of  visual  information  which  has  little  semantic  valve. 

4.1  Pattern  Recognition 

Present  pattern  recognition  has  been  concentrated  basically  in  three  areas; 

(a)  Aerial  Photography 

The  enormous  amount  of  aerial  photographic  interpretctlcn  now  required  has  resulted 
in  a  definite  need  for  automatic  identification  of  tactical  and  strategic  targets 
and  discrimination  of  terrains  as  a  preselected  aid  to  photo  Interpretation  by 
humans.  The  problem  of  determining  a  unique  representation  of  a  target  and 
accounting  for  its  variations  is  still  not  solved.  Also,  extraction  of  noise  from 
pattern  is  a  serious  problem.  The  recognition  of  terrain  types  is  somewhat  easier 
since  analysis  of  picture  detail  and  grey  level  classification  is  a  solution  key. 
Once  the  properties  are  defined,  then  some  statistical  approach,  such  as  Bayes 
procedure,  can  be  employed. 

(b)  Medical  Field 

Pattern  recognition  as  an  aid  to  medical  analysis  is  concerned  with  such  problems 
as  (a)  analysis  of  X-ray  films  for  tumors,  (b)  irregularities  in  blood  cells  or 
other  parts  of  the  body,  and  (c)  classification  of  blood  cells.  Again  the  basic 
problem  is  defining  the  property.,.  Most  work  is  accomplished  by  sampling  results 
and  looking  for  significant  characteristic  differences  rather  than  using  any  preset 
decision  rule. 

The  objective,  of  course,  is  to  relieve  the  doctor  from  examining  all  possible  data 
when  he  need  be  concerned  with  only  selected  groupings.  This  change  would  reduce 
the  valuable  time  a  doctor  must  spend  looking  at  X-rays,  for  example;  it  also 
reduces  the  eye  fatigue  that  may  lead  to  an  improper  analysis. 

(c)  Voice  Prints 

Voice-print  Identification  is  a  method  by  which  people  can  be  identified  from  a 
spectrogrivhic  examination  of  their  voice.  This  method,  which  is  analogous  to 
fingerprint  identification,  is  to  be  accomplished  by  uniquely  defining  people 
according  to  their  utterances.  The  technique  is  based  on  examining  the  amplitude 
contours  as  affected  by  people's  vocal  cavities  and  articulators.  The  size  of 
vocal  cavities  and  articular  uses  for  different  people  are  claimed  to  be  unique. 
Speech  contours  of  various  people  are  still  being  examined  to  determine  (a)  complete 
uniqueness,  (b)  voice  changes  of  people  with  time,  and  (c)  disguising  of  voices. 

4. 2  Character  Recognition 

OCR  devices  are  slowly  taking  over  the  market  from  magnetic  character  readers  since  the 
OCR  offers  high  flexibility.  It  can  read  various  types  of  source  data,  offers  increased 
reliability,  and  is  substantial competitive  in  cost  with  the  magnetic  reader  when  single 
fonts  are  used.  New  OCR  devices  incorporate  the  mark-sense  and  bar-code  features.  The 
magnetic  character  reader’s  bas.ic  advantages  are  as  follows:  (1)  the  security  problem  is 
better  handled  because  magnetic  characters  cannot  be  forged  as  easily  as  regular  recorded 
characters;  (2)  it  can  read  when  dirt  is  a  problem;  and  (3)  it  can  read  whenever  over- 
stamping  is  a  problem. 

The  present  purchase  price  of  commercial  magnetic  readers  averages  around  $80.  000.  The 
prices  for  optical  readers  range  from  $90, 000  to  $600, 000  depending  on  the  speed  and 
sophistication  of  recognition  cost  (rentals  run  from  $3000  to  $20,000  a  month). 


k.2.i  "nje  character  reader  appiication  has  fallen  into  three  basic  areas: 

(a)  In-House  Applications 

In  this  situation  the  character  reader  is  replacing  the  keypunch  machine.  All 
document  preparation  and  character  reading  are  done  in  a  centralized  location, 
permitting  tight  document  control.  If  the  present  application  requires  approxi¬ 
mately  eight  to  ten  keypunch  operators,  then  it  becomes  a  definite  candidate  for 
the  use  of  a  character  reader.  In  this  situation,  the  single  font  reader  is 
recommended  due  to  the  controlled  conditions. 

(b)  Turn-Around  Documents 

Ttiese  documents  are  prepared  by  the  computer,  sent  out,  returned,  and  then  read 
back  into  the  system.  The  single  font  reader  is  excellent  for  this  situation. 

(c)  Field  Documents 

This  represents  the  most  uncontrolled  situation,  and  the  character  readers  are 
becoming  more  adept  at  reading  different,  forms  with  different  kinds  of  typing; 
they  are  even  going  into  the  handwritten  application.  The  bulk  of  present  equip¬ 
ment  (multifont  and  handprinted  readers)  is  concentrated  on  meeting  applications 
in  this  area. 

4.2.2  A  representative  list  of  uses  for  character  readers  is  as  follows: 

(a)  Oil  Industry 

The  oil  Industry  has  been  using  OCR  devices  to  read  embossed  cards  where  the 
customer  has  a  credit  card  and  the  data  are  transferred  from  the  credit  card  onto 
the  document  by  means  of  imprinters.  Added  to  this  is  the  cost  of  tk-?  particular 
purchase  which  can  also  be  put  in  by  the  imprinter,  by  nark  sense,  or  by  hand  in 
the  future.  The  oil  industry  also  has  turn-around  document  situations  in  which 
statements  are  sent  out  to  customers  and  have  to  be  returned  and  read  by  the 
system.  Invoice  billing  of  the  various  service  stations  is  another  area  where  the 
oil  companies  can  use  character  readers.  Snail-cost  single  readers  are  applicable. 
Handprinted  character  readers  can  be  used  although  they  are  not  necessary. 

(b)  Transportation 

The  airlines  in  the  United  States  have  taken  an  extensive  look  at  optical  readers. 

TWA  and  United  are  using  optical  readers  to  read  the  preprinted  ticket  numbers  at 
the  bottom  of  the  ticket.  A  future  use  may  be  to  have  the  reader  read  tickets  and 
issue  boarding  passes  or  read  ticket  baggage.  This  application  will  require  the 
character  reader  to  go  from  its  normal  batch  processing  Jobs  to  real-time  Jobs  with 
the  physical  separation  of  the  scanner  unit  from  the  recognition  unit.  Data  can  be 
transmitted  by  means  of  facsimile  transmission  between  the  scanner  and  the  recogni¬ 
tion  unit. 

(c)  Banks 

The  banks  have  used  magnetic  character  readers  for  reading  preprinted  check  informa¬ 
tion.  However,  a  trend  towards  (X}R  devices  is  apparent  in  banks;  they  are  using 
them  to  read  name  and  address  changes,  installment  loan  information,  stock  transfer 
Information,  etc.  for  input  to  the  computer. 

(d)  Inventory  Control 

One  problem  of  Inventory  control  is  getting  the  source  data  frc'<i  various  distribution 
points  into  a  centralized  data  processing  system  where  they  can  be  processed  fay  the 
computer  system.  Input  is  at  present  accomplished  by  keypunching.  Conversion  to 
source  data  automation  equipment  such  as  OCR  or  magnetic  encoders  is  at  present  being 
performed  with  success  and  should  continue. 
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(e)  Insurance  Company 

The  Insurance  companies  have  been  using  the  optical  readers,  basically  mark  sense, 
to  read  application  forms  and  have  not  fully  exhausted  all  the  uses  of  the  reader. 

(f)  Post  Office 

The  US  Post  Office  is  now  experimenting  with  multifont  alphanumeric  readers  to  read 
typewritten  business  mail  (which  accounts  for  approximately  70%  of  the  total).  The 
readers  are  capable  of  reading  the  entire  alphanumeric  address  and  zip  code.  Future 
developments  Include  a  character  reader  ci4>able  of  reading  handwriting  and  printing. 

(g)  Publishing  Houses 

Optical  rerders  are  used  to  convert  subscription  fonns  and  returned  customer  state¬ 
ments  into  machine  language.  A  new  potential  triplication  is  the  reading  of  book 
covers  returned  to  the  publisher  fnr  money  refund. 

(h)  Social  Security 

The  US  Social  Security  Office  in  Maryland  is  utilizing  a  multifont  character  reader 
for  directly  reading  employee  tax  information  as  received  by  employers. 


5.  TRENDS 

Awareness  of  both  character  and  pattern  recognition  has  increased  significantly  in  the 
past  year  basically  due  to  the  commercial  acceptance  of  the  character  reader.  Once 
considered  a  research  device,  this  reader  is  now  taking  its  place  in  the  data  processing 
system.  Therefore,  pattern  recognition,  which  has  taken  secondary  consideration,  can  now 
be  given  more  technical  attention.  The  future  challenges  to  the  scientist  and  engineer 
definitely  lie  in  the  area  of  pattern  recognition. 

Rhlle  pattern  recognition  work  is  Involved  in  property  definition  and  determination  of 
proper  identification  algorithms,  character  recognition  work  is  based  on  increasing  the 
reader  flexibility,  reliability,  and  speed  while  reducing  the  costs  to  cover  a  wider  area. 

In  this  respect  there  are  eight  basic  trends  in  character  readers. 

(a)  Software 

There  is  an  upward  trend  to  use  a  programmable  unit  for  recognition  rather  than  to 
rely  on  special  purpose  hardware.  Programmable  units  increase  flexibility;  e. g. , 
they  can  read  forms  having  different  character  fonts  and  field  formats.  In  addition, 
the  programmable  unit  can  be  used  for  data  extraction,  sequencing,  and  manipulation 
to  reduce  the  computer  load  further. 

(b)  Recognition  of  Handwriting 

The  work  being  done  on  the  recognition  of  handwritten  characters  can  be  divided  into 
two  classes:,  handprinted  and  script.  The  ability  to  read  numeric  handwritten 
characters  already  exists  in  the  readers  manufactured  by  IBM  and  Optical  Scanning 
Corporation.  Recognition  Equipment  has  recently  announced  the  capability  to  read 
handprinted  alphanumerics.  The  IBM  1287  Optical  Reader,  for  example,  can  read 
handprinted  numeric  digits  and  five  alphabetic  control  symbols,  but  a  glimpse  at 
the  .Igld  set  of  rules  shown  in  Figure  6  emphasizes  that  the  concept  is  still  quite 
restricted  in  practice.  However,  reading  of  script  characters  is  only  in  the 
developmental  stage,  the  Post  Office  being  the  primary  customer. 

(c)  Context  Recognition 

Context  recognition,  a  long-range  effort  to  reduce  reject  and  error  rates,  is  an 
attempt  to  simulate  the  human  ability  to  apply  contextual  significance  to  characters 
or  elements  vhich  might  otherwise  be  devoid  of  meaning.  When  a  person  reads,  the 
legibility  of  individual  letters,  or  even  of  individual  words,  is  not  usually 
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critical.  Human  beings  “read"  or  perceive  letters  within  the  context  of  th*  entire 
work,  and  words  within  the  context  of  an  entire  sentence,  Consequently,  the  reader 
eislly  identifies  "Sxrxxt"  (where  the  “x’s”  represent  garble),  in  the  phrase  "2231 
South  12th  Sxrxxt",  as  “Street"  even  though  only  50%  of  the  letters  are  readable. 

Ibis  identification  is  possible  because  of  the  conditional  dependency  of  letters 
and  words  in  human  communication. 

Although  context  recognition  is  not  yet  sophisticated  enough  to  become  a  major 
factor  in  a  recognition  scheme,  it  can  be  used  as  a  back-up  method  for  identifying 
illegible  characters.  The  most  obvious  advantage  of  this  technique  lies  in  its 
potential  to  identify  a  complete  word  in  which  one  or  more  characters  may  .oresent 
serious  recognition  difficulties. 

(d)  Off-Line  Versus  On-Line 

One  of  the  benefits  of  OCR  is  placing  the  input  peripheral  device  off-line  (in  most 
cases).  The  on-line  card  reader  has  slowed  down  the  computer  and  has,  in  some 
cases,  required  a  separate  computer  for  preprocessing.  The  trend  of  character 
readers  is  toward  Independent  off-line  units  producing  magnetic  tapes  for  computer 
processing. 

(e)  Speed 

Another,  but  less  critical,  area  of  developmental  emphasis  in  character  readers  is 
in  speed.  Reading  speed  is  at  present  limited  by  the  amount  of  time  required  to 
mechanically  move  the  document  past  the  reading  station.  The  overlapping  of  the 
reading  and  transport  operations  is  accomplished  by  using  storage  tubes  (i.e. , 
vidicon  scanners)  or  by  reading  “on  the  fly",  l^eed  can  also  be  increased  by  using 
form  controls  which  perform  selective  field  reading,  and  skip  blank  spaces.  The 
reliability  of  the  character  reader  not  only  affects  its  accuracy,  but  also  has  a 
significant  impact  on  document-reading  c^abllity;  actual  reading  speed  is  obviously 
affected  if  a  document  must  be  read  more  than  once. 

(f)  Improvements  in  Reliability 

Naturally,  reliability  in  the  form  of  low  error  and  reject  rates  is  a  prime  con¬ 
sideration  in  all  the  development  work  being  done  on  character  readers.  One  approach 
to  reduce  these  rates  is  to  Improve  the  resolution  of  the  scanning  units  and  thereby 
increase  the  number  of  sample  points  from  which  tbe  equipment  can  make  an  identifica¬ 
tion.  As  previously  mentioned,  Philco-Ford  Corpoiatlcn  is  using  a  cathode-ray  tube 
that  has  a  resolution  of  2000  optical  lines.  Even  better  resolution  can  be  expected 
in  the  near  future. 

The  reading  reliability  of  the  character  readers,  in  terms  of  reject  and  error  rates, 
has  Improved  substantially  due  more  to  the  source  document  preparation  control, 
typist  training,  proofreading,  and  special  checks  within  the  character  reader  than 
to  the  recognition  logic  itself..  The  present  reject  rate  for  in-house  form  prepara¬ 
tion  is  presently  two  to  three  percent.  Based  on  improvements  in  preparation,  this 
reject  rate  will  drop  significantly  to  below  1%  in  the  next  five  years,  and  error 
rates  will  fall  to  below  0.5%.  Keypunching  error  rates  are  presently  1.5%. 

(g)  Cost 

Present  commercially  available  character  readers  are  designed  for  large-scale 
operations  (more  than  10,000  documents  per  day),  in  which  cost  can  be  Justified. 

There  is,  however,  a  definite  need  for  a  low-cost  single-font  character  reader 
(approximately  $20,000)  which  could  read  fixed-format  single-document  types.  In 
view  of  the  recent  character  set  standardization,  it  would  appear  that  a  trend 
toward  such  a  device  is  now  likely. 

(h)  Remote  Scanners 

The  use  of  remote  scanners  connected  in  a  time-sharing  configuration  with  a 
centralized  recognition  unit  is  within  the  state  of  the  art  and  can  be  expected 
soon. 


6.  CONCLUSION 


Our  world  exhibits  the  confidence  that  no  scientific  challenge  is  too  great  to  meet 
sooner  or  later.  Pattern  and  character  recognition  development  is  no  exception,  and 
present  work  Indicates  that  the  next  decade  will  see  amazing  results.  Ihe  character 
reader  has  proven  its  ability  to  meet  such  a  challenge;  pattern  recognition  is  a  step 
away. 
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COMPUTER  ON-LTNE 
OB  VIA  MAGNETIC  TAPE, 
PAPER  TAPE,  CARDS,  ETC. 


Fig.  1  Block  diagram  of  paltem/character  recognizer  reader 


Fig. 3  Sample  of  BULL  magnetic  reader  type  font  characters 


ABC  1 

l\!OP^ 

012 


iEFGHIJKLn 


CONTitOt.  SYMBOUft 

’  yZfiVr^ 

OnW'd&My- 


Pig.  4  OSASI  font 


ABCDEF6H 

IJKLNi^OP 

QRSTUVyX 

01234567 

89 

Fig.  5  ISO  Class  B  font 


RULE  CORRECT 


DraHE! 


S.CuOSE 

LOOPS 


.PLE 

SHAPES 


4. DO  NOT 
LINK 
charac¬ 
ters 


5.CONNECT 

LINES 


6. BLOCK 
PRINT 


INCORRECT 


felTIXIZI 


I 

I 

I 


Fig. 6  Handwriting  rules  for  IBM  1287 
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DISCUSSION 


R.  J.Dubon:  Are  there  any  plans,  In  the  USA,  to  standardize  the  fonts  used  In  the  printing 
industry? 

L.  A.Feidelman.  Standardization  of  type  fonts,  paper  ard  irk  is  required  to  aid  character 
recognition.  A  standard  has  been  drawn  up  fay  representatives  of  US  computer  manufacturers. 
Ihere  is  also  ar^  International  Standard  Pont. 


P.Molzberger:;  Present-day  multifont  reading  machines  require  expensive  hardware  or  very 
long  programs  additional  to  that  of  the  main  computer.  Is  there  llkeijr  to  be  a  trend  in 
computer  manufacturing  to  make  it  possible  to  run  an  ordinary  computer  as  a  highly 
specialised,  parallel  working  recognition- logic  complex? 

L. A.Feidelman:  In  a  parallel  working  system  it  would  still  be  necessa^.  have  a  separate 
processor,  separate  memory,  etc.  so  that  this  is  probably  not  the  answer.  In  many  ways  an 
off-line  recognition  system  is  the  better  method. 


H.F. Vessey;  Why  have  some  organisations  ceased  using  optical  scanning  for  bibliographic 
input  to  the  computer? 

M.  S.Day:  NASA  has  tried  using  optical  readers  for  input,  and  althougi!  it  works,  it  has 
been  found  to  be  too  costly  at  present.  Hopefully,  costs  will  come  down  in  the  future. 


S.Skoumal:  Can  you  foresee  the  use  of  central  recognition  hardware  providin';  input  to  a 
remote  linked  computer? 

L. A.Feidelman:  Ihls  is  a  possibility.  Airlines  plan  to  have  remote  scanners  at  boarding 
points  and  information  from  passenger  tickets  will  be  passed  to  a  central  facility  to  see 
if  the  information  matches  that  already  held  before  the  passenger  is  allowed  aboard. 


PAPER  6 


EFFICIENT  TRANSFER  OF 
TEXTUAL  INFORMATION 
by 


J.  W.  Altman 


American  Institute  for  Research, 
Pittsburgh,  USA 


SUMMARY 


Three  problems  which  arise  in  attempts  to  achieve  efficient  provision  of 
textual  information  to  scientists  and  engineers  are  defined: 

(a)  Text-sensitive  tasks  of  scientists  and  engineers  have  not  been 
delineated  and  analysed  sufficiently  to  define  clearly  their 
requirements  for  textual  information  support. 

(b)  Methods  have  not  yet  been  established  to  permit  characterisation  of 
text  in  terms  which  support  the  development  of  a  technology  for 
efficient  transfer  of  text  to  users. 

(c)  Little  concentrated  effort  has  been  put  into  attempts  to  establish 
lawful  relationships  between  text  -  sensitive  tasks  and  character¬ 
istics  of  text. 

Ways  of  tackling  these  problems  are  suggested. 
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EFFICIENT  TRANSFER  OF  TEXTUAL  INFORMATION 
J.W.  Altman 


1.  THE  PROBLEM 

Price®  has  described  the  growth  of  scientifically  based  knowledge  in  graphic  terms: 

. .  any  young  scientist,  starting  now  and  looking  back  at  the  end  of  his  career  upon 
a  normal  life  span,  will  find  that  80  to  90  percent  of  all  scientific  work  achieved  by 
the  end  of  the  period  will  have  taken  place  before  his  eyes,  and  that  only  tO  to  20 
percent  will  antedate  his  experience  (pp.2-3)." 

Under  such  circumstances,  the  need  for  efficient  transfer  of  inforioation  from  authors 
to  potential  users  hardly  requires  belabouring.  Yet,  three  central  problems  remain: 

(a)  Text -sensitive  tasks  of  scientists  and  engineers  have  not  been  delineated  and 
analyzed  In  such  a  way  as  to  define  clearly  their  requirements  for  textual  informa- 
cion  support. 

(b)  Methods  have  not  yet  been  established  which  permit  characterization  of  text  in 
terms  which  support  generation  of  a  technology  for  efficient  transfer  of  text  to 
users. 

(c)  Little  concentrated  effort  has  gone  into  attempts  to  establish  lawful  relationships 
between  text -sensitive  tasks  and  textual  characteristics. 

It  is  the  purpose  of  this  paper  to  review  some  of  the  findings  already  available  con¬ 
cerning  these  issues  and  to  suggest  ways  in  which  they  might  be  resolved. 


2.  DEFINITIONS 

The  use  of  the  terms  “transfer,"  "text,"  "Information.”  and  "efficiency"  in  this  paper 
requires  some  definition.  "Transfer^"  is  used  here  to  refer  to  the  process  of  transmitting 
knowledge  from  an  author  to  potential  users  of  that  knowledge.  It  is  assumed  that  authors 
will,  in  the  main,  follow  the  precepts  commonly  accepted  for  effective  technical  writing. 

It  is  also  assumed  that  any  machine  or  manual  processing  of  the  author's  text  that  is  im¬ 
plied  here  can  be  Implemented  within  the  general  scope  of  available  techniques.  Conse- 
sequently,  this  paper  will  not  deal  with  textual  processing,  storage,  and  retrieval  as 
such.  Rather,  it  will  emphasize  how  the  requirements  for  such  processing,  storage,  and 
retrieval  should  be  established,  "Text”  is  used  here -to  refer  to  written  narrative, 
tables,  illustrations,  graphs,  formulas,  or  any  combination  of  them.  “Informtiorl'  is  any 
identifiable  influence  on  behaviour  other  than  a  direct  physical  restraint  or  physiological 
impairment.  Textual  information  or  text-carried  information  is  thus  behaviour  which  can 
be  demonstrated  to  be  a  result  of  exposure  to  given  text.  For  present  purposes,  it  can  be 
seen  that  information  is  in  the  behaviour  of  the  user  rather  than  being  a  directly 
measurable  characteristic  of  text. 

"Efficiency"  Hill  be  discussed  primarily  in  terms  of  achieving  quality  scientific  and 
engineering  task  performance  with  minimum  expenditure  of  user  time  to  exploit  the  support¬ 
ing  text. 
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3.  PHASES  OP  TEXT  USE 

My  primary  structure  will  reflect  the  principal  phases  of  textual  use,  as  follows: 

(a)  Screening. 

(b)  Gaining  and  maintaining  awareness  of  a  scientific  or  technical  field. 

(c)  Application  of  text -mediated  information. 

Figure  1  shows  a  schematic  summary  of  relationships  among  these  three  phases  and  their 
major  sub-phases.  I  will  not  tarry  here  to  discuss  these  phases  of  text  use  but  will 
discuss  their  salient  features  in  passing  as  I  attempt  to  identify  some  of  the  more 
critical  information  transfer  problems  for  each  phase. 

Perhaps  a  word  of  caution  would  be  in  order  here,  however.  The  phases  of  text  use 
presented  here  are  somewhat  arbitrarily  chosen  to  support  the  current  discussion.  It  is 
not  my  intention  tc  imply  a  reflection  of  a  well-mapped  domain  of  human  behaviour,  for 
such  mapping  has  not  yet  taken  place. 

3. 1  Screening 

Document  screening  can  be  conceived  as  occurring  in  three  principal  waves.  The  first 
wave  involves  selection  of  a  set  of  candidate  documents  from  the  total  body  of  scientific 
and  technical  text.  The  second  wave  involves  selection  from  the  set  of  initially  con¬ 
sidered  documents  those  that  will  receive  serious  scrutiny.  The  third  wave  includes  read¬ 
ing  of  documents  and  selection  of  those  parts  tha;  will  be  remembered  or  applied.  This 
third  wave  is  so  inextricably  entwined  with  maintaining  awareness  and  application  that  I 
shall  deal  with  only  the  first  two  waves  here. 

3.1.1  Search 

(a)  Retrieval  Based  on  Recall 

The  sequence  from  identification  of  a  technical  task  to  be  performed  to  the  availability 
of  a  tentative  bibliography  from  which  to  select  documents  for  detailed  review  is  perhaps 
the  phase  currently  containing  the  most  dysfurctions  between  textual  information  systems 
and  sc lent 1st -engineer  needs.  It  is  hardly  surprising  that  this  is  a  phase  of  text  use 
beset  with  difficulty  since  it  is  one  in  which  it  is  necessary  to  go  from  the  entire  body 
of  available  scientific  and  technical  documentation  to  a  relatively  short  list  of 
reasonable  candidates.  Neither  is  it  surprising  that  there  has  been  a  preoccupation, 
over  the  last  decade  or  so,  with  retrieval  schemes  to  support  the  purposes  of  this  phase. 

I  shall  not  attempt  here  to  review  the  current  status  of  document  retrieval  systems. 
Rather,  I  would  like  simply  to  draw  a  distinction  between  recall  systems  and  non-recall 
systems  and  to  discuss  some  of  the  salient  features  of  each.  Recall  systems  are  based  on 
the  user's  recollection  of  a  specific  document.  I  shall  not  discuss  the  mechanics  of 
retrieving  specific  documents,  whether  remembered  with  fidelity  or  semi-reliably.  However, 
it  may  be  appropriate  to  point  out  that  the  citation  index  is  one  technique  foi  projecting 
one’ s  memory  ahead.  That  is,  if  one  remembers  a  given  document  as  being  relevant  to  a 
task  at  hand,  by  reference  to  a  citation  lp..iex,  it  is  possible  to  identify  subsequent 
documents  which  cited  the  remembered  docume.it  -  which  presumably  are  likely  to  be  related 
to  the  remembered  document. 

Recall-based  retrieval  tends  to  follow  rather  classic  patterns  and  to  be  accomplished 
without  undue  difficulty.  Whenever  one  either  remembers  specific  documents  or  knows  who 
is  doing  the  most  related  work,  retrieval  of  appropriate  documentation  is  likely  not  to 
pose  serious  conceptual  problems.  Indeed,  “in”  members  of  “invisible  colleges”’ 
generally  claim  little  difficulty  in  maintaining  currency  with  the  scientific-technical 
documentation  of  greatest  relevance^’ 
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(b)  Retrieval  Based  on  Descriptors 

If  a  researcher  is  not  willing  to  depend  upon  his  own  memory  or  the  knowledge  of  his 
friends  to  ensure  comprehensive  coverage  of  relevant  documents,  he  is  faced  with  a  much 
more  difficult  problem.  True,  he  may  have  a  whole  armoury  of  descriptor  languages, 
structural  models,  organizing  principles  for  files  and  codes,  search  procedures,  and 
automated  document  retrieval  aids^^;  but  it  is  impossible  to  pretend  that  the  sum 
total  of  our  existing  retrieval  systems  will  necessarily  suffice  to  insure  an  adequate 
search  of  all  the  appropriate  documentation.  The  inadequacies  of  existing  retrieval 
systems  are  likely  to  become  particularly  apparent  in  coping  with  the  newer  and  less 
rigorously  delineated  fields- -precisely  those  in  which  most  of  the  exciting  action  is 
likely  to  be  taking  place. 

Since  the  body  of  reasonably  current  scientific-technical  literature  is  too  large  for 
any  of  us  to  thumb  through  in  a  lifetime  and  is  growing  at  an  accelerating  rate  that  would 
put  us  further  behind  at  the  end  than  when  we  started,  we  must  depend  for  document 
retrieval  upon  procedures  that  will  preclude  our  having  to  look  at  any  significant  portion 
of  the  total  body  of  scientific-technical  text.  If  we  are  to  avoid  the  vagaries  of  random 
sampling,  this  meaus  that  we  must  depend  upon  analogs  of  analogs.  Each  scientific- 
technical  document  represents  a  symbolic  analog  of  some  phenomenological  domain.  Yet, 
the  documents  themselves  represent  too  bulky  a  phenomenological  field  to  be  dealt  with 
directly.  If  we  are  to  have  facile  ways  of  dealing  with  this  body  of  symbolic  analogs, 
we  must  depend  upon  yet  greater  abstractions  (simplifications)  of  these  analogs  to  the 
real  world.  But  what  kinds  of  analogs  will  suffice  for  this  purpose? 

I  cannot  hope  to  review  here  the  great  number  and  variety  of  schemes  which  have  been 
devised  to  provide  a  facile  analog  to  a  corpus  of  documents.  Instead,  I  will  explore 
briefly  one  approach  which  seems  to  have  promise  for  clarifying  the  underlying  logic  of 
document  .''etrieval  systems  as  well  as  practical  Implications  for  improvement  in  retrieval, 
particularly  automatic,  of  documents. 

Ossorlo*  has  demonstrated  the  possibility  of  classifying  documents  in  n-dimenslonal 
Euclidean  space.  Without  espousing  Ossorio’s  particular  definition  or  approach,  it  is 
possible  to  see  how  dlmensionallzing  the  domain  of  documents  and  the  phenomena  to  which 
they  refer  can  enhance  retrieval  of  documents.  Such  dimensionalizing  can  be  based  on 
multivariate  statistical  analysis  as  was  Ossorio’s  or  on  more  direct  analysis  and  repre¬ 
sentation  of  the  phenomenological  fields  with  which  science  and  technology  deal. 

Let  us  suppose  that  we  have  structured  a  given  field  according  to  n  orthogonal 
(uncorrelated)  dimensions.  Let  us  further  imagine  that  we  are  able  to  scale  each  of  the 
descriptors  for  any  document  according  to  one  or  more  of  the  underlying  dimensions  which 
provide  gross  structure  for  the  field.  Finally,  let  us  assume  that  we  can  frame  our 
request  for  a  document  search  in  descriptors,  each  of  which  can  also  be  scaled  according 
to  one  or  more  of  the  underlying  dimensions. 

Given  these  assumptions,  it  can  readily  be  seen  that,  even  though  different  terms  may 
be  used  to  describe  documents  and  requests  for  document  search,  documents  and  search  have 
a  common  frame  of  reference  in  the  underlying  dimensions  used  to  structure  the  phenomeno¬ 
logical  field.  With  such  a  common  frame  of  reference,  it  should  be  possible  to  derive  a 
meaningful  quantitative  basis  for  matching  documents  and  requests. 

It  may  be  an  adequate  first -approximation  assumption  that  the  probable  utility  of  a 
document  will  be  a  monotonic  increasing  function  of  the  following: 
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where: 

U  Is  a  figure  of  merit  for  probable  utility  of  a  given  document  for  a  given  request, 

D 

^  indicates  a  sumnation  across  all  of  the  dimensions  used  to  structure  the  field, 

indicates  a  summation  across  all  of  the  terms  scalable  on  a  given  dimension  ^given 
n  terms  for  the  document  scalable  on  the  dimension  and  k  terms  for  the  request  so 
scalable,  there  would  be  nk  factors  under 

d  is  the  dimensional  scaling  difference  (without  regard  to  sign)  of  a  given  term  in 
the  document  description  versus  a  given  term  in  the  retrieval  request, 

D  is  the  range  of  a  given  dimensional  scale,  and 

n  is  a  power  function  probably  best  determined  by  empirical  study  of  the  rate  at  which 
utility  tends  to  fall  off  as  a  function  of  distance  between  document  and  request 
along  a  given  dimension. 

The  probable  utility  of  a  given  document  is  not  only  a  function  of  its  convergence  with 
a  potential  user’s  needs,  but  may  also  be  influenced  by  the  redundancy  of  that  document 
with  another.  That  is,  the  user  is  likely  to  have  use  for  a  given  piece  of  Information 
only  once.  If  it  is  contained  in  more  than  one  document,  all  but  one  document  may  be 
superfluous.  Consequently,  it  may  be  desirable  to  present  to  the  possible  user  only  the 
most  recent  document  of  any  sets  which  are  extremely  close  on  all  dimensions. 

3.1.2  Bibliographic  Revietn 

The  initial  search  for  documents  relevant  to  a  given  intended  use  should  result  in  a 
bibliographic  listing  of  some  type.  It  has  been  observed  ir  a  number  of  situations^* 
that  the  accuracy  of  initial  document  screening  is  not  greatly  influenced  by  the  extent 
of  textual  detail  available  to  support  this  screening.  Also,  such  screening  with  highly 
attenuated  text  can  be  accomplished  generally  with  about  30  to  50  percent  saving  in  time 
over  the  use  of  full  text.  Consequently,  it  seems  entirely  appropriate  to  depend  upon 
highly  attenuated  text  for  such  initial  document  screening. 

Descriptive  titles  should  suffice  for  short,  relatively  homogeneous  articles.  Brief 
Indicative  abstracts  or  topic  descriptors  should  suffice  to  represent  longer  or  more 
varied  documents  for  piorposes  of  Initial  screening  by  the  user. 

The  merits  of  attenuated  representations  of  documents  for  initial  screening  by  a 
potential  user  suggest  that  automatic  retrieval  systems  which  result  in  a  bibliographic 
output  modestly  amplified  by  key  descriptors  may,  indeed,  be  compatible  with  user  needs 
and  behavioral  tendencies.  Of  course,  this  does  not  mitigate  the  need  for  such  retrieval 
systems  to  furnish  appropriate  bibliographic  output  for  review. 

3. 2  Gaining  and  Maintaining  Awareness 

Whether  one  is  retrieving  documents  for  immediate  application  or  simply  accumulating 
information  for  possible  future  purposes,  it  is  required  of  the  user  that  he  become  aware 
of  textual  content  through  reading.  This  truism  need  not  hide  the  fact,  however,  that 
the  textual  requirements  for  maintaining  current  awareness  and  gaining  awareness  of  text 
for  an  immediate  and  specific  purpose  are  different.  I  will  discuss  the  textual  require¬ 
ments  of  maintaining  general  current  awareness  of  a  field  prior  to  discussing  textual 
needs  for  specific  purposes. 

3.2.1  Maintaining  Current  Auareness 

Several  stua..ds^’ ‘  have  demonstrated  the  utility  of  att3nuated  text  vabstracts  and 
extracts)  in  comprehending  the  essential  meaning  of  scientific  and  technical  articles. 


The  upper  limits  for  reduction  without  significant  loss  of  general  comprehension  tend  to 
be  about  50  percent  (or  up  to  75  percent  for  exceptionally  verbose  material)  for  words, 

25  to  30  percent  for  figures  and  formulas.  Such  textual  reductions  result  in  savings  in 
the  neighbourhood  of  15  to  35  percent  in  reading  and  comprehension  time.  Attenuation  of 
text  to  the  neighbourhood  of  85  to  95  percent  of  the  original  can  be  achieved  with  loss 
in  comprehension  (measured  in  terms  of  ability  to  answer  salient  questions  derived  from 
the  full  text)  from  10  to  50  percent. 

Differences  between  comprehension  from  full  text  versus  comprehension  from  attenuated 
text  tend  to  be  maximal  for  text  of  intermediate  complexity.  That  is,  comprehension 
levels  from  attenuated  and  full  text  tend  to  be  most  similar  where  comprehension  from 
full  text  is  either  very  poor  or  very  good. 

Before  passing  on  from  matters  of  maintaining  general  awareness  of  a  given  field  to 
more  specific  applications  of  text,  it  is  perhaps  appropriate  to  say  a  few  words  about 
possible  improvements  in  selective  disseminaf  :  for  this  purpose.  In  grneral,  the 
comments  made  concerning  the  screening  of  a  body  of  documents  for  retrieval  of  selected 
documents  to  serve  a  specific  purpose  are  also  relevant  here.  Just  as  specific  requests 
can,  at  least  theoretically,  be  scaled  according  to  some  set  of  underlying  dimensions 
which  structure  the  field,  so  also  might  an  individual’ s  general  interests  be  similarly 
scaled.  Documentation  having  similar  multidimensional  scaling  might  then  be  brought  to 
his  attention  on  a  regular  basis. 

3.2.2  Gaining  Specific  Amreness 

it  the  purpose  is  to  exploit  a  given  set  of  pre-screened  documents  for  a  specific 
application,  the  textual  requirements  are  substantially  different  from  those  for  maintain¬ 
ing  current  general  awareness.  Abstracts  and  extracts  can  usefully  support  maintenance 
Oi  general  awareness  with  Interesting  savings  in  the  bulk  of  material  disseminated  and 
in  reading  time,  and  with  only  modest  loss  in  com  . vbension  of  the  major  content  of  the 
original  document.  But  specific  applications  usually  require  greater  detail  than  is 
provided  by  abstracts  or  extracts. 

Payne  and  Hale^  found  an  average  loss  of  about  30  percent  (across  a  number  of  scien¬ 
tific  and  technical  fields)  in  efforts  to  use  extensive  descriptive  abstracts  as  a  source 
of  specific  facts.  Interestingly,  neither  the  extensive  abstracts  nor  briefer  abstracts 
resulted  in  a  time  saving  over  the  use  of  full  text  for  fact  retrieval. 

This  relative  inadequacy  of  abstracts  and  extracts  for  specific  applications  makes  it 
tempting  to  conclude  that  the  original  document  is  required  in  most  cases  for  such  purposes. 
I  will  present  some  notions  in  the  following  section,  however,  which  suggest  that  such  a 
conclusion  may  be  premature. 

3.3  Application 

In  this  section  are  first  presented  some  of  the  considerations  relating  to  textual 
analysis.  Then,  text  characteristics  are  related  to  levels  of  application. 

3.3.1  Textual  Analysis 

Boldovicl  and  Altman^  found  that  it  was  possible  to  break  scientific  and  technical  text 
into  elements  which  they  called  “textual  units.” 

The  operations  Involved  in  reducing  a  techni '%1  or  scientific  article  into  textual 
units  may  be  described  as  follows:  (1)  the  arv.ole  is  first  divided  into  its  gross  and 
clearly  separable  parts;  (2)  each  resultant  section  is  then  sub-divided  into  parts  which 
can  be  separated  without  violence  to  the  apparent  intent  of  the  author;  and  (3)  the  parts 
are  then  sub-divided  into  progressively  smaller  parts  to  the  point  at  which  it  appears  that 
further  division  would  obscure  or  violate  the  Intent  of  the  author. 


70 

The  uniformity  of  textual  units  resulting  from  the  procedure  described  above  may  be 
tested  and  Improved  by  applying  the  following  criteria: 

(a)  A  textual  unit  is  a  segment  of  technical  text  which  may  impart  information 
(l.e. ,  reduce  uncertainty)  when  taken  by  itself. 

(b)  A  textual  unit  expresses  a  complete  thought,  and  can  be  taken  out  of  context 
without  completely  losing  meaning. 

(c)  A  unit  of  technical  text  is  a  segment  of  written  and/or  pictorial  material  which, 
if  further  divided,  will  lose  meaning  unless  reconfigured  into  essentially  its 
original  form. 

(d)  All  of  the  material  in  a  textual  unit  is  so  interrelated  that  it  would  take  more 
words  to  explain  any  sub-division  than  are  contained  in  the  original  unit. 

Given  an  array  of  textual  units  which  corresponds  to  a  whole  article,  textual  analysis 
proceeds  by  categorizing  each  unit  as  to  whether  it  was  of  primary,  secondary,  or  tertiary 
importance  to  the  intent  of  the  author’s  communication. 

Textual  units  Judged  to  bear  primary  Importance  to  the  author’s  communicative  intent 
are  then  arranged  in  a  convenient  spatial  configuration  (e.g.,  from  left  to  right  on  a 
page)  which  reflects  the  apparent  temporal  progression  of  the  author’s  writing,  beginning 
with  problem  statements  and  ending  with  conclusions.  Secondary  textual  units  are  then 
attached  by  lines  to  the  primary  units  to  which  they  appear  most  relevant.  Tertiary  units 
are  similarly  attached  to  secondary  ones. 

Major  types  of  textual  units  are  defined  in  Table  1.  Idealized  configurations  of  units 
for  deductive,  empirical  research,  and  developmental  studies  are  presented  in  Figures  2-4, 
respectively, 

3,3,2  Levels  of  Application 

We  might  now  use  the  foregoing  characterization  of  text  as  a  basis  for  relating  it  to 
levels  of  application,  arbitrarily  limiting  levels  to  the  following  four: 

•  Reflecting  simple  awareness. 

•  Reflecting  sophisticated  awareness  and/or  collation. 

•  Reflecting  analysis  and  synthesis. 

•  Reflecting  evaluation. 

Reflection  of  simple  awareness  may  require  familiarity  with  and  comprehension  of  only 
a  limited  set  of  textual  units--most  likely  those  involving  conclusions  and  implications. 
More  sophisticated  awareness  and  collation  of  material  from  different  documents  or 
portions  of  the  same  document  imply  a  need  for  at  least  selective  familiarity  with 
results.  Interpretations,  and  proof.  Analysis  or  synthesis  of  results  in  terms  other 
than  those  of  the  original  author  implies  thorough  understanding  of  the  conditions  under 
which  the  results  were  obtained  and  the  axioms  or  rationales  which  underlay  their  genera¬ 
tion.  To  evaluate  the  adequacy  of  previous  work  implies  a  detailed  understanding  of  the 
derivations  and  methods  as  well  as  the  results,  conclusions,  and  implications. 

It  can  be  readily  seen  that  there  is  a  hierarchy  of  applications  which  has  a  parallel 
range  of  needs  for  different  numbers  and  types  of  textual  units  for  minimal  appropriate 
support.  That  non-chance  relationships  exist  between  the  nature  of  applications  and  needs 
for  different  textual  units  seems  beyond  a  reasonable  doubt.  However,  the  nature,  strength, 
and  stabilivy  of  such  relationships  are  essentially  unknown  at  present. 

Any  of  the  more  extensive  or  complex  scientific  and  technical  documents  is  likely  to 
have  a  number  of  more  or  less  independent  logical  strings  of  textual  units.  Any  given 
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application  is  likely  to  have  relevance  to  only  a  selected  sub-set  of  the  entire  set  of 
strings  in  a  given  document.  Given  higlily  accessible  and  facile  information  systems  in 
the  future,  it  is  not  inconceivable  that  a  prospective  user  would  be  selectively  exposed 
to  only  those  strings  of  textual  units  having  likelihood  of  being  relevant  to  a  particular 
application  and  in  an  order  most  consistent  with  his  scientific  and  technical  task— with 
subsequent  units  in  the  string  being  presented  on  demand. 


4.  CONCLUSIONS 

Based  on  the  various  findings  and  arguments  advanced  thus  far,  I  would  suggest  tbe 

following  conclusions: 

1.  In  the  preparation  cf  original  text: 

(a)  Some  of  the  extensive  attenuations  which  have  been  accomplished  in  text  without 
measurable  loss  of  information  suggest  that  there  may  be  considerable  reduction  yet 
possible  in  the  volume  of  text  generated  through  more  rigorous  attention  to  the 
preparation  and  monitoring  of  publication  standards.  Ibis  prospect  seems  particu¬ 
larly  rich  when  it  is  recalled  that  the  attenuations  were  accomplished  on  articles 
in  published  journals,  which  tend  to  be  among  tbe  most  terse  of  scientific  and 
technical  text. 

(b)  The  traditional  ordering  of  text  to  parallel  the  logical  sequence  followed  by  the 
investigator  may  be  less  than  optimum  for  the  individual  who  uses  text.  More 
optimal  orders  of  presentation  for  use  should  be  sought. 

(c)  The  delineation  of  textual  units  in  tbe  process  of  preparing  text  for  storage  and 
retrieval  may  serve  as  a  powerful  aid  tc  efficient  textual  transfer.  Its  potentials 
should  be  investigated. 

2.  The  structuring  of  phenomenological  fields  by  the  methods  of  multivariate  analysis 
and  related  rational  processes  should  be  explored  as  an  aid  to: 

(a)  Retrieval  of  documents  on  tbe  basis  of  specific  queries. 

(b)  Selective  dissemination  of  document  lists  to  match  individual  Interests. 

(c)  Characterization  and  retrieval  of  individual  textual  units  as  well  as  whole 
documents. 

3.  Abstracts  and  extracts  show  promise  as  efficient  aids  to  maintaining  current  awareness 
of  a  field.  Only  a  rudimentary  technology  now  exists  ccoceming  this  issue.  Itaat 
technology  should  be  strengthened  by  greatly  expanded  empirical  studies  of  the 
characteristics  of  attenuated  text  which  make  it  a  suitable  substitute  for  full  text. 

4.  Information  systems  to  date  have  concentrated  almost  exclusively  on  the  retrieval  of 
whole  documents.  Work  should  be  initiated  on  determining  the  feasibility  and  value  of 
retrieving  individual  textual  units  and  strings  of  related  units. 
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TABLE  1 

Definitions  of  Major  Types  of  Textual  Units 


TOPIC--A  statement  whose  main  purpose  is  to  delineate  the  nature  and  limits  of 
subordinate  content. 

PROBLEM--A  statement  of  conditions  which  justify  the  establishment  of  a  technical 
obj ective. 

PURPOSE--A  statement  of  Intended  technical  accomplishment. 

DEFINITION— A  statement  having  as  its  primary  purpose  the  delineation  of  standard 
reference  for  terms  used  elsewhere  in  the  text. 

METHOD— A  description  of  the  activities  of  the  investigator  or  his  surrogates. 
RATIONALE- -Justification  of  a  method. 

CONDITION--A  description  of  environmental  characteristics  presumed  to  have 
relevance  to  some  result. 

CONSTRAINT— A  statement  of  conditions  emphasizing  the  limits  within  which  the 
technical  effort  took  place. 

AXIOM- -A  proposition  accepted  on  its  intrinsic  merit. 

1£MMA— An  auxiliary  proposition  accepted  as  true  for  use  in  demonstration  of 
another  proposition. 

THEOREM— A  proposition  subject  to  logical  proof, 

HYPOTHESIS--A  proposition  subject  to  empirical  demonstration. 

DERIVATION- -An  intermediate  stage  of  logical  manipulation  between  a  theorem  and 
final  demonstration  or  proof. 

RESULT--A  statement  of  observations  presumed  to  have  relevance  to  a  theorem, 
hypothesis,  or  condition  of  Interest. 

ENTITY'-Descriptlon  of  a  technique  or  device  resulting  from  developmental 
effort. 

COROLLARY- -An  Immediate  derivation  from  a  proven  proposition. 

PROOF- -Logical  demonstration  of  the  truth  of  a  theorem,  within  the  limits  of 
truth  of  its  related  axioms. 

INTERPRETATI(X4--The  bringing  to  bear  of  additional  data,  logical  argument,  or 
relationships  to  clarify  results. 

CONCLUSION— A  statement  of  belief  about  the  reality,  or  reliability  of  a 
finding. 

IMPLICATION- -A  statement  of  a  belief  about  the  breadth,  depth,  or  nature  of 
application  for  a  finding. 
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DISCUSSION 


D.Bosaan:  People  engaged  in  different  disciplines  attach  different  meanings  to  the  same 
descriptor.  To  what  extent  could  this  be  incorporated  in  the  concept  of  "user  profiles” 
introduced  in  the  paper  read  by  H.  Dubon  (Paper  3)^  How  useful  would  it  be? 

J.W.  Altman:  There  is  a  very  direct  relationship  between  the  meanings  attached  to  des¬ 
criptors  and  user  profiles.  The  system  described  by  M.  Dubon  aims  to  give  a  match  between 
descriptors  and  profile  but  it  could  also  be  used  to  get  a  measure  of  the  distance  between 
the  profile  and  the  descriptors.  A  quantitative  measure  of  how  close  is  the  information 
available  to  that  requested. 


J.R.  C.Licfcllder:.  One  criticism  of  user  studies  Is  that  they  have  paid  too  much  attention 
to  what  users  say  they  need  and  not  enough  to  what  they  actually  need.  Another  criticism 
is  that  such  studies  examine  existing  methods  and  techniques,  whereas  users  need  improved 
methods  and  techniques.  How  does  your  approach  get  away  from  existing  methods  and 
techniques?  How  does  it  find  out  what  users  need  as  distinct  from  what  users  cio? 

J. I.  Altman:  We  have  established  a  relationship  between  text  and  what  users  do  with  it. 

But  if  you  take  the  existing  text  and  break  it  down  into  units  and  then  manipulate  the 
text  units  in  various  ways  you  develops  new  configurations  not  in  the  original  text.  It 
has  been  found  that  by  eliminating  redundancy  it  is  possible  to  cut  the  existing  te.\t  by 
half  without  destroying  the  sense  of  the  original.  In  this  way  it  is  possible  to  identify 
principles  which  could  have  been  applied  to  improve  the  text  from  the  users  point  of  view 
at  the  time  when  it  was  written. 
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SUMMARY 


The  components  of  an  automatic  storage  and  retrieval  system  are  briefly 
reviewed. 

The  storage  process  consists  of  transcription  of  bibliographic  detail, 
and  sometimes  abstracts,  into  machine  readable  form,  generation  of  a 
thesaurus,  and  automatic  Indexing  of  documents  using  this  thesaurus. 

Finally,  an  automatic  process  may  be  applied  which  generates  a  library 
classification  system  for  the  collection. 

The  Interactive  man-computer  retrieval  process,  which  follows  the  stor¬ 
age  process,  offers  the  best  potential  for  improving  retrieval  effectiveness. 
The  user  may  search  thesauri,  classification  schedules,  catalogues  or  the 
documents,  in  a  manner  very  similar  to  that  employed  traditionally  in 
libraries  but  with  far  less  effort  and  at  much  greater  speed. 
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ON-LINE  INFORMATION 


STORAGE  AND  RETRIEVAL 


N.S.  Prywes 


1.  DISCUSSION  OF  THE  PROBLEM 

This  paper  deals  with  the  Interrelated  issues  of  pre-processing  of  documents  and  the 
effectiveness  of  retrieval  of  documents.  Justifiably,  the  storage  and  retrieval  problem 
is  currently  of  great  concern.  The  effectiveness  in  retrieving  documents  is  highly  de¬ 
pendent  on  the  amount  of  labour  and  proces  :g  invested  in  the  storage  of  the  documents. 
Namely,  the  retrieval  is  greatly  facilitated  by  storage  processing  products  such  as 
catalogues  or  storage  allocation  schemes,  xhese  are  used  in  retrieval  in  referencing 
catalogues  or  in  follovlng  a  convenient  placing  scheme  while  browsing  through  shelves  of 
documents.  In  effect,  the  problems  of  storage  and  retrieval  are  a  single  problem.  This 
paper  reviews  briefly  the  components  of  a  total  storage  and  retrieval  system  while  refer¬ 
encing  relevant  developments. 

The  storage  process  described  Includes  all  the  functions  which  take  place  in  libraries 
and  information  centres  from  acquisition  to  the  placing  of  the  documents  in  the  repository. 
This  process,  which  includes  Indexing,  cataloging  and  vocabulary  maintenance,  demands  a 
great  deal  of  time  and  expertise.  In  any  one  of  the  large  libraries  or  information  cen¬ 
tres,  there  are  thousands  of  monographs  and  serials  that  are  waiting  to  be  catalogued  and 
Indexed.  These  often  lay  unused  because  of  the  dearth  of  competent  cataloguers  and  index¬ 
ers,  especially  those  expert  in  particular  subjects  and  languages.  The  increased  amount 
of  material  which  is  being  circulated  soon  may  require  substantial  increase  in  staff. 

Staff  with  this  competence  is  extremely  scarce;  low  salaries  discourage  young  people  from 
library  work.  For  these  reasons  the  storage  process  tends  to  constitute  a  serious 
bottleneck. 

On  the  retrieval  side,  evaluation  tests  indicate  that  libraries  and  information  centres 
operate  at  a  low,  almost  inacceptable  retrieval  effectiveness.  The  library  user  requiring 
specific  information  is  overwhelmed  with  information,  much  of  which  is  irrelevant. 

The  mechanizing  of  procedures  in  an  information  centre  or  a  library  does  not  need  any 
more  Justification  than  the  notion  of  mechanizing  any  other  industrial,  commercial,  or 
service  function.  The  premise  of  this  paper  is  that  automatic  storage  processing  and  on¬ 
line  retrieval  are  competitive  in  effectiveness  with  manual  procedures.  The  automatic 
procedures  are  not  especially  complex  and  they  can  be  readily  applied. 

The  automated  storage  processing  discussed  here  includes  the  following  steps. 

Citations  and  sometimes  abstracts  of  incoming  documents  are  first  transcribed  into  machine 
readable  form.  Natural  language  processing  of  title  and  abstract  results  first  in  a  con¬ 
cordance  of  stem  words.  The  concordance  may  also  provide  information  about  the  frequency 
of  stem  words.  In  a  semi-automatic  process,  words  may  be  omitted,  added,  or  various  rela¬ 
tionships  established  between  words  to  form  an  open-ended  thesaurus.  Then,  based  on  this 
thesaurus,  the  incoming  documents  are  automatically  indexed.  Finally,  an  automatic 
process  may  be  applied  which  generates  a  library  classification  system  for  the  collection. 
Such  a  classification  then  represents  a  scheme  for  placing  documents  on  shelves,  in 
microforms,  or  in  the  computer,  as  appropriate. 

The  interactive  man-computer  retrieval  process,  which  follows  the  storage  process, 
offers  the  best  potential  for  improving  retrieval  effectiveness  to  the  point  where  infor- 
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oatlon  storage  and  retrieval  systems  become  really  useful.  This  interactive  process  has 
a  number  of  aspects.  An  individual  can  communicate  with  a  central  computer  through  a 
remote  terminal  “on-line”,  i.e.,  where  the  terminal  is  continuously  monitored  by  a  central 
computer.  The  computer  deletes,  changes  and  analyzes  the  Queries  and  retrieves  information 
in  “real-time",  compatible  with  the  normal  working  speed  of  a  human.  To  fulfill  these 
functions,  the  computer  must  have  a  storage  capacity  of  billions  of  characters  with 
fractional  second  access  to  any  information. 

In  the  interactive  retrieval  process  the  user  may  search  thesauri,  classification 
schedules,  catalogues  or  the  documents  in  a  manner  very  similar  to  that  employed  tradi¬ 
tionally,  in  libraries;  however,  with  far  less  effort  and  much  greater  speed.  He  may,  for 
instance,  reference  documents  by  title,  author,  publisher,  citation,  subject,  or  browse 
through  citations  or  abstracts  of  documents  on  a  common  subject,  placed  together  in  the 
memory  of  the  computer. 

Methods  and  procedures  like  those  described  in  this  paper,  such  as  content  analysis, 
concordance  and  thesaurus  preparation  and  Indexing,  which  require  merely  clerical  proce¬ 
dures,  have  been  proposed  for  centuries. They  have  been  opposed  by  those  who  believe 
that  manual  processing  of  the  document  has  a  “quality"  superior  to  algorithmic  processing 
based  on  selection  of  words  from  the  abstract  or  even  from  the  title.  The  manual  approach 
hqs  a  number  of  ancillary  positions  that  are  contested  here.  For  Instance,  the  manual 
approach  also  conveys  the  notion  that  the  subject  term  vocabulary  needs  to  be  controlled, 
and  that  only  highly  competent  persons  in  specific  areas  should  exercise  judgment  in  re¬ 
gard  to  adding  terms;  these  positions  are  contradictory  to  the  approach  in  this  paper. 

The  objective  of  the  procedures  described  here  is  to  do  away  with  much  of  the  vocabulary 
maintenance  work  currently  prevalent;  especially  the  notes  and  instructions  directed  to 
indexers  and  cataloguers  which  would  not  be  required  in  an  automated  system. 


2.  STORAGE 

2. 1  The  Input  of  Documents  and  Content  of  the  Repositories 

The  repository  Includes  a  collection  of  document  representations.  Each  document  is  an 
integral  entity  in  the  collectirn.  It  may  be  broken  down  as  shown  in  Fig.l.  The  exami¬ 
nation  and  analysis  of  documents  in  the  storage  or  retrieval  processes  is  usually  con¬ 
ducted  in  the  order  from  top  down  of  the  information  shown  in  Fig.l.  Generally,  as  one 
proceeds  downward  in  Fig.l,  greater  depth  and,  frequently,  greater  volume  of  information 
are  provided;  however,  less  frequent  access  is  required  to  the  more  voluminous  parts. 

The  upper  four  boxes  in  Fig.l  are  said  to  contain  association  terms.  These  are  words 
or  terms  such  as  title,  author,  subject,  etc.,  which  identify  single  or  entire  classes  of 
documents.  The  association  terms  may  convey  information  about  various  relationships  among 
respective  documents,  such  as  having  a  subject  heading  or  citation  in  common,  or  sequence 
of  events  Indicated  by  dates  of  publications,  etc. 

The  language  analysis  in  the  storage  process  could  be  based  on:  (a)  entire  text,  (b) 
association  terms  and  abstracts,  u  (c)  association  terms  only.  The  cost  of  transcription 
into  machine  readable  form  decreases  greatly  as  the  amount  transcribed  is  reduced; 
however,  this  also  reduces  retrieval  effectiveness.  There  is  an  indication,  however,  that 
effectiveness  of  retrieval  based  on  the  subject  of  the  document  increases  considerably 
(20-25%)  irtien  the  transcription  of  the  abstract  is  added  to  the  transcription  of  the  asso¬ 
ciation  terms. Language  analysis  of  full  text  does  not  seem  to  Improve  the  effective¬ 
ness  of  retrieval  sufficiently  to  warrant  the  considerably  greater  cost  of  transcription. 
Also,  the  content  analysis  of  text  requires  more  complex  procedures,  including  syntactic 
or  semantic  analysis. 

The  document  collection  is  only  one  part  of  the  information  in  the  repository  as  shown 
in  Pig. 2.  The  other  parts  contain  directories  of  the  association  terms,  and  stratification 
of  these  directories.  The  directories  may  be  considered  to  be  information  about  information. 


The  directories  may  be  generated  a  posteriori  from  the  documents  themselves.  Namely, 
the  association  terms  shown  in  Flg.l  may  be  extracted  automatically  from  each  document  as 
it  enters  the  collection.  In  this  way  the  concordance  of  the  terms  for  all  the  documents 
may  be  derived  automatically.  The  aggregate  of  the  various  types  of  association  terms 
then  constitute  an  all-inclusive  directory  or  concordance  of  the  association  terms. 

Further  processing  then  establishes  the  higher  level  directories  which  contain  assignment 
of  terms  to  categories  and  a  variety  of  relationships  among  the  terms. 

The  generally  prevalent  approach  to  indexing  and  vocabulary  maintenance  is  that  of 
applying  human  Judgement  a  priori.  An  example  of  this  approach  is  the  establishment  of 
the  Dewey  Decimal  Classification  which  has  divided  the  library  collection  into  progressively 
more  specific  classeb.  Using  this  system,  professional  indexers  in  libraries  assign  sub¬ 
ject  headings  (stated  in  terms  of  class  numbers)  from  a  controlled  schedule  to  the 
documents  as  they  enter  the  libraries.  In  time,  such  a  classification  system  must  be  ex¬ 
panded  and  revised  by  the  library  community  to  recognize  new  areas  not  included  in  pre¬ 
vious  schedules.  Re-examination  and  reclassifying  of  documents  already  in  the  collection 
is  then  necessary  to  assign  the  new  subject  headings  to  them. 

Figure  2  illustrates  the  a  priori  and  a  posteriori  approaches  to  generating  the 
directories  as  opposites.  (A  variety  of  mixes  of  these  two  approaches  is  possible. ) 

Retrieval  effectiveness  tests  indicate  that  a  posteriori  indexing  performs  as  well  as 
a  priori  indexing;  and  that  the  lack  of  term  control  in  a  posteriori  indexing  does  not 
cause  deterioration  in  performance. This  will  be  further  discussed  below  in  con¬ 
nection  with  the  evaluation  of  retrieval  effectiveness. 

2.2  Language  Processing 

The  simplest  language  processing  procedure  is  to  analyze  a  text  to  recognize  and  gener¬ 
ate  stems  of  words  encountered  in  the  input  material.  This  involves  recognizing  the 
suffixes  of  words.  A  suffix  editing  procedure  for  English  is  described  by  Stone,  et  al.^**) 

A  similar  procedure  for  French  has  been  described  by  Gardin  and  bis  associates. Simi¬ 
lar  procedures  have  been  developed  by  numerous  other  investigators. More  sophisticated 
procedures  including  matching  stem  words  against  a  thesaurus  and  syntactic  or  semantic 
analysis  of  text  may  be  employed  in  the  automatic  indexing  and  classification  as  discussed 
below. 

Natural  language  processing  and  machine  translation  research  are  relevant  since  many  of 
the  algorithms  developed  there  are  directly  applicable  to  automatic  indexing.  However, 
the  systems  employing  the  more  complex  procedures  are  highly  experimental  and  in  many 
cases  the  research  has  not  advanced  beyond  the  theoretical  considerations. 

2.3  Concordance  and  Thesaurus  Geneivtlon 

Although  completely  automatic  thesa'arus  generation  procedures  have  been  under  develop¬ 
ment  for  some  time,  considerable  experience  has  been  accumulated  with  a  semi-automatic 
approach. Computer  aids  are  provided,  but  human  intellect  is  applied  to  the  discrimin¬ 
ation  and  grouping  of  words.  The  first  step  in  this  process  is  to  use  computer  aids  which 
accept  the  transcribed  portions  of  the  documents  as  an  input  and  generate  a  concordance  of 
stem  words.  This  concordance  includes  title  or  abstract  words  in  addition  to  the  other 
association  terms  in  Fig.l.  The  computer  aids  also  provide  frequencies  of  occurrence  for 
the  words  in  the  concordance. 

The  first  step  in  deriving  a  thesaurus  may  be  the  elimination  of  the  very  high  and  very 
low  frequency  words. Another  step  would  be  the  indicating  of  "broader",  “narrower”  or 
"related"  relationships  between  words.  Especially  important  also  is  the  recognition  of 
synonyms.  It  is  necessary  to  establiih  such  relationships  as  the  documents  have  been 
written  by  many  people  at  different  times  who  use  a  variety  of  words  to  designate  similar 
meanings.  Categories  may  be  constituted  which  contain  various  instances  of  word  usage; 


each  such  word  may  be  given  in  context.  Another  approach  Is  to  prepare  a  separate 
thesaurus  for  specific  subject  areas,  where  appropriate  relationships  between  words  are 
established  in  the  context  of  the  subject  areas. 

The  thesaurus  generation  process  is  similar  to  the  vocabulary  maintenance  functions  in 
conventional  libraries.  However,  the  on-line  automated  aids  may  provide  suggestions  with 
regard  to  words  and  categories  which  deserve  the  attention  of  the  individual  engaged  in 
establishing  relationships  among  words.  For  instance,  frequencies  of  terms  used  in  re¬ 
trieval  queries  and  index  terms  of  relevant  documents,  which  have  been  retrieved  in  re¬ 
sponse  to  these  queries,  may  serve  as  a  guide  regarding  association  and  relationships 
among  terms.  Various  statistics  about  frequencies  of  co-occurrence  of  terms  may  be  used 
to  combine  terms  into  phrases  which  will  be  used  in  their  entirety  as  a  single  term  in  the 
indexing  process.  Finally,  the  automatic  generation  of  a  classification,  described  later, 
may  provide  further  Information  about  grouping  and  sub-groupings  of  terms  and  respective 
documents  to  form  progressively  more  generic  subject  areas. 

2.4  Automatic  Indexing 

Various  automatic  Indexing  approaches  and  systems  have  been  described  by  Stevens. 

The  objective  here  is  to  review  briefly  the  simplest  procedures  which  have  proved  effective. 
In  the  most  simple  procedures,  stem  words  derived  from  titles  or  abstracts  are  considered 
to  be  the  index  terms  of  the  respective  documents  without  reference  to  the  thesaurus  at 
all.  This  simple  process  has  proved  effective  for  retrieval  in  situations  where  a  user 
is  satisfied  with  retrieval  of  any  one  or  few  relevant  documents.  This  method  has  also 
proved  especially  effective  in  an  lnJ;eractive  mode  of  search  where  the  user  may  guide  the 
computer  in  search  for  relevant  material. 

Automatic  indexing  may,  however,  utilize  far  more  sophisticated  approaches.  A  perusal 
of  the  thesaurus  for  stem  words  derived  from  titles  or  abstracts  may  result  in  important 
indexing  decisions.  It  would  eliminate  undesired  terms,  or  assign  documents  to  classes 
or  categories.  Still,  a  more  complex  process  may  assign  term  phrases  based  on  words 
co-occurrences  or  based  on  syntactic  analysis. 

2.5  Automatic  Generation  of  a  Classification  System  and  Assignment  of 
Location  For  Documents 

The  automatic  generation  of  a  classification  system  in  fact  groups  citations  of  documents 

cells  in  the  memory  of  the  computer,  very  much  as  the  documents  on  a  common  subject  are 
grouped  on  respective  library  shelves.  The  retrieval  process  then  consists  of  a  search  of 
several  shelf  areas  in  a  large  library  to  find  the  documents  relating  to  a  subject  on  which 
information  is  demanded.  A  classification  system,  automatic  or  conventional,  has  a  dual 
purpose.  It  is  a  methodology  for  placing  like  documents  together  but  it  is  also  a 
retrieval  methodology  by  which  one  may  be  guided  to  the  group  of  "like"  documents  which 
deal  with  the  area  of  his  interest.  Like  conventional  classifications,  an  automatic 
classification  system  may  be  used  to  put  documents  away,  but  only  after  the  classificatj in 
system  itself  is  derived  from  the  documents.  Namely,  it  does  not  precede  the  documents, 
but  follows  them.  The  automatic  classification  process  is  a  follow-up  on  the  automatic 
subject  indexing.  It  attempts  to  put  together  in  a  cell  documents  which  have  most  index 
terms  in  common. 

The  scope  of  this  paper  does  not  permit  a  description  of  the  process  for  automatically 
creating  a  classification  system.  Various  methodologies  have  been  used  for  this  process. 
These  consist  of  employing  statistical  techniques,  computing  "distances” 

between  documents,  and  employing  co-occurrence  of  index  terms.  The  latter 

approach  is  simplest  in  terms  of  the  complexity  of  the  process  and  amount  of  processing 
required.  A  collection  composed  of  4,000  documents  with  a  vocabulary  of  6,000  index 
terms  has  been  processed  to  date. Experiments  are  continuing  at  the  University  of 
Pennsylvania  with  collections  of  tens  of  thousands  of  documents. 
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It  is  important  to  note  here  that  automatic  classification  may  be  used  not  only  to 
complement  a  coordinate-indexing  retrieval  scheme,  but  it  also  constitutes  an  alternative 
to  coordinate- indexing.  If  used  in  a  coordinate  indexing  system,  automatic  classification 
methodology  provides  a  storage  arrangement  and  a  directory  which  greatly  speeds  up  the 
search  and  retrieval. As  an  alternative  to  coordinate  indexing,  automatic  classifi¬ 
cation  and  the  arrangement  of  documents  in  cells  allows  the  user  to  direct  the  computer 
in  its  search  toward  the  area  of  interest.  This  is  further  described  below  in  connection 
with  interactive  retrieval  techniaues. 


3.  RETRIEVAL 

3.1  Retrieval  by  Association  Terns 

A  basic  property  of  an  on-line  retrieval  system  is  a  man-computer  language  which  in¬ 
cludes  in  its  vocabulary  all  the  association  terms.  (See  Fig. 1)  A  simple  search  may  be 
initiated  ny  the  user  communicating  to  the  system  a  description  of  desired  ir.fomation. 

To  “describe”  a  single  or  a  class  of  documents,  it  is  necessary  to  supply  in  a  query  the 
association  terms  as  well  as  the  relationships  among  the  terms.  The  procedure  consists 
of  specifying  the  association  terms  of  the  desired  documents  and  the  requisite  logical 
or  arithmetic  relationships  among  the  terms  or  among  other  information  elements  within 
the  document.  It  is  Important  that  a  user  at  the  terminal  should  be  capable  of  expressing 
a  query  in  terms  most  convenient  for  him.  For  that  reason,  ample  choice  must  be  given 
to  him  to  search  by  various  types  of  association  terms,  such  as,  author,  publication, 
title  words,  accession  numbers,  references,  etc.  In  addition,  be  should  be  able  to 
reference  the  various  directories,  such  as  the  thesaurus  or  the  automatic  classification, 
to  aid  him  in  selection  of  terms.  Similarly,  be  should  be  able  to  specify,  for  instance, 
a  generic  term  to  include  all  the  narrower  terms  which  correspond  to  it.  Finally,  he 
should  be  able  to  examine  the  citations  which  are  being  retrieved  by  the  system  and  res¬ 
pond  by  indicating  their  relevance  to  his  subject  of  interest.  In  these  interactions 
with  the  computer,  the  display  formats  of  the  computer  responses  are  important  to  the 
facility  with  which  a  system  may  be  used.  These  formats  are  arranged  to  minimize  the 
user’ s  labour  in  selecting  terms  or  documents. 

On-line  retrieval  systems  may  be.  divided  into  two  classes.  The  systems  which  aid  user 
formulation  of  queries  and  retrieve  respective  documents  are  referred  to  here  as  key  vord 
systems.  The  second  type  of  systems  provides  automatic  reformulation  of  the  query  based 
on  indications  from  the  user  of  satisfaction  or  dissatisfaction  with  the  retrieved 
material.  In  fact,  in  this  manner  the  user  guides  and  directs  the  search  of  the  computer 
system. 

3.1.1.  Key-Word  Retrieval  Systems 

An  outstanding  example  of  the  kev-word  system  is  the  BOU)  system  at  System  Development 
Corporation,  developed  by  Borko. BOLD  utilizes  on-line  displays  which  assist  the 
the  user  both  in  acquiring  a  mastery  of  the  system  itself  and  in  performing  guided 
searches.  No  language  analysis  technique  is  used  in  BOU)  and  the  indexing  is  entirely 
manual.  The  MULTILIST  system  at  the  University  of  Pennsylvania  is  another  example  of  a 
key  word  retrieval  capability  based  on  list  processing  which  facilitates  split-second 
retrieval  from  large  document  collections.  The  MULTILIST  system  includes  both  manually 
indexed  (artificial  Intelligence)  and  automatically  indexed  (Physics)  collections. 

BOLD  and  MULTILIST  are  representative  of  tjrpical  current  systems.  With  these  systems 
retrieval  is  easier  but  the  basic  content  of  the  query  is  not  altered  except  at  the  insis¬ 
tence  of  the  user.  Namely,  while  formulation  of  the  query  is  assisted  by  the  system, 
there  is  no  attempt  at  reformulation  based  on  the  results  of  previous  searches. 
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3.1.2  Interactive  Query  Reformulating  Systems 

The  procedure  In  retrieval  with  a  reformulating  system  may  be  as  follows.  A  user  may 
desire  to  search  the  collection  to  obtain  a  bibliography  on  a  certain  subject.  He  would 
then  submit  a  query  to  the  system  consisting  of  word-stem  terms.  These  terms  may  be 
found  in  directories  (Fig. 2).  The  system  then  will  use  the  automatic  library  classifi¬ 
cation  which  has  been  generated  to  find  the  cell(s)  which  correspond  to  the  largest 
number  of  terms  in  the  query.  (Alternately,  weights  may  be  associated  with  the  terms  and 
cells  are  selected  which  have  documents  indexed  with  the  maximum  total  weight  of  the  terms.)  The 
user  may  then  consider  a  number  of  citations  from  the  respective  cell  or  cells,  and  he 
may  indicate  acceptance  or  rejection  of  certain  citations  as  relevant  or  irrelevant, 
respectively.  The  terms  corresponding  to  the  accepted  or  rejected  documents  will  then  be 
examined  by  the  computer  and  the  initial  query  may  be  reformulated.  It  will  include  add¬ 
itional  terms  derived  from  acceptable  documents  or  it  will  omit  some  of  the  initial  terms 
that  are  in  the  rejected  documents.  Based  on  the  newly  reformulated  query,  a  search  is 
repeated,  new  cells  are  found,  and  their  content  is  displayed  to  the  user.  This  process 
may  continue  with  the  input  from  the  user  being  primarily  the  approval  or  disapproval  of 
retrieved  material. 

This  approach  has  been  experimented  with  in  the  SMART  Project  and  the  results  have  been 
evaluated  to  determine  the  effectiveness  of  this  powerful  strategy. ^  Experiments  with 
this  approach  have  also  been  conducted  by  Edwards. ^ 

3.2  Evaluation  of  Retrieval  Effectiveness 

As  has  been  amply  illustrated,  there  are  a  great  variety  of  thesaurus  generation  and 
automatic  indexing  strategies  as  well  as  of  retrieval  strategies.  It  is  also  quite 
apparent  that  the  selection  of  a  strategy  is  very  critical  to  the  cost  and  retrieval 
effectiveness  of  the  system.  An  evaluation  methodology  has  been  developed  to  determine 
retrieval  effectiveness  of  systems. As  has  been  already  indicated,  increased  costs 
and  labour  in  storage  processing  may  result  in  improvement  of  retrieval  effectiveness. 
However,  the  amount  of  cost  measure  as  related  to  the  improvement  in  retrieval  effec¬ 
tiveness  is  very  important.  Also,  for  various  retrieval  applications,  different  degrees 
of  effectiveness  in  retrieval  are  required. 

Although  tests  of  retrieval  effectiveness  have  often  been  seriously  challenged  on  a 
variety  of  grounds,  two  measures  of  retrieval  effectiveness  appear  to  receive  wide  accept¬ 
ability.  One  of  these  measures  -  the  recall  ratio  -  is  the  ratio  of  the  number  of 
relevant  documents  retrieved  to  the  total  number  of  documents  in  a  collection  which  are 
relevant  to  a  search.  The  other  measure,  the  precision  ratio,  is  the  ratio  of  the  number 
of  relevant  documents  retrieved  to  the  total  number  of  documents  retrieved  in  a  search. 

For  a  sequence  of  queries  interactively  executed  in  the  search,  a  plot  can  be  made  of 
precision  vs.  recall. It  is  important  to  point  out  here  that  only  the  conjunction  of 
these  two  measures  is  meaningful  as  an  indication  of  effectiveness  of  retrieval  strategies. 
The  most  ideal  conditions  would  be  those  corresponding  to  unity  recall  and  unity  precision. 
For  Instance,  perfect  recall  can  always  be  achieved  by  retrieving  an  entire  collection; 
the  precision,  however,  would  then  be  extremely  low.  On  the  other  hand,  if  the  number  of 
retrieved  documents  is  very  small,  the  precision  might  be  unity,  but  the  recall  would  be 
very  low.  This  Illustrates  that  the  combination  of  recall  and  precision  must  be  considered 
in  the  evaluation.  A  strategy  is  considered  to  be  more  effective  if  its  plot  of  precision 
vs.  recall  is  described  by  a  curve  closer  to  the  ideal  point  of  precision  m  1  and  recall 
s  1.  Examination  of  litertture^^^^  indicates  that  in  this  respect.  Joint  recall  precision 
retrieval  effectiveness  Improves  as  well  chosen,  more  sophisticated  language  processing 
techniques  are  applied,  or  as  the  retrieval  process  is  carried  out  on-line,  interactively 
employing  greater  choice  of  association  terms. 
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4.  CONCLUSION 

The  cost  and  staffing  that  are  denanded  of  a  library  that  desires  to  offer  effective 
retrieval  services  are  currently  very  large.  Many  sinaller  libaries  try  to  use  cataloguing 
and  indexing  material  generated  in  large  information  centres  but  even  utilization  of  such 
resources  requires  considerable  staff  and  cost.  These  smaller  libraries  may  be  the  real 
beneficiaries  from  a  total  on-line  storage  and  retrieval  facility  as  described  in  this 
paper.  The  state  of  the  art  indicates  that  such  a  system  is  feasible  and  economical  to 
develop  at  this  time. 
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DISCUSSION 


J.R.  Weiner:  What  precision  have  you  obtained  in  your  retrieval,  and  what  do  you  hope  to 
attain? 

N.  S.Prywes:  My  work  is  with  a  large  collection  of  documents  and  satisfactory  tests  for 
precision  of  retrieval  from  large  collections  are  still  required. 

Professor  Salton,  working  with  a  smaller  collection  (Reference  2)  has  obtained  precision 
of  greater  than  60  per  cent. 


H.  F.Vessey:  I  am  disturbed  at  the  statement  that  terms  occurring  infrequently  are 
eliminated  in  thesaurus  compilation.  Terms  such  as  project  titles,  names  etc.,  might 
only  occur  once  or  twice  but  be  very  powerful  terms  for  retrieval  purposes. 

N. S.Prywes:  The  list  of  low  frequency  words  is  comparatively  short,  only  a  few  hundred 
words.  This  list  could  be  weeded  manually  to  retain  such  terms  as  might  be  useful  in  a 
search. 


A.  H. Holloway:  You  have  mentioned  a  precision  of  about  20  per  cent  and  that  Professor 
Salton  expects  to  achieve  80  per  cent  but  have  not  mentioned  the  recall.  It  is  not 
difficult  to  achieve  a  precision  of  100  per  cent  with  a  very  low  recall..  Can  you  say 
what  combination  of  these  criteria  you  hope  to  achieve? 

N. S.Prywes:  We  are  investigating  whether  our  users  want  good  recall  or  good  precision  as 
alternatives.  For  teaching  and  research  it  is  often  acceptable  to  have  high  precision 
with  fairly  low  recall. 


R.  Bree:  Could  you  please  say: 

(1)  The  number  of  documents  used  in  the  trials  of  the  system. 

(2)  From  what  part  of  the  text  Is  the  descriptor  material  extracted. 

(3)  What  is  the  computer  economy  of  this  method  of  mechanical  text  analysis. 

N.  S.  Prywes: 

(1)  A  collection  of  6,000  documents  obtained  from  the  Department  of  the  Air  Force. 

(2)  About  10,000  words  have  been  extracted  from  the  titles  only  of  the  document. 

We  are  proposing  to  test  the  system  on  a  larger  collection  of  documents  in  Nuclear 
Science  Abstracts,  using  titles.  Universal  Decimal  Classification  headings  and 
Defence  Documentation  Centre  Descriptors. 

(3)  Systems  must  be  worked  on  a  serial  process  such  as  magnetic  tape. 

The  classification  has  been  organised  into  a  tree  at  three  levels.  It  takes  three 
passes  of  tapes  for  the  entire  collection  to  obtain  material  for  each  level.  Each 
pass  takes  about  one  hour.  lt>datlng  is  carried  out  monthly  and  takes  about  ten 
hours.  Daily  updating  was  tried  but  this  did  not  produce  sufficient  new  entries. 
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StlMMARY 


Several  problems  involving  non-numerical  mathematics  are  listed.  In  the 
field  of  non-numerical  data  processing,  the  following  topics  are  discussed 
briefly;-  Group  theory;  Games  theory;  Translation;  Graph  theory; 

Pattern  recognition  and  enhancement. 
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NON-NUMERICAL  MATHEMATICS  AND  DATA  PROCESSING 
P.  Krttckeberg 


1.  NON-NUMERICAL  MATHEMATICS 

Many  problems  of  modern  mathematics  are  non-numerical  in  structure.  It  Is  a  simple 
matter  to  calculate  the  path  of  a  rocket  numerically,  based  on  the  theory  of  differential 
equations.  This  subject  properly  belongs  to  Analysis.  On  the  contrary,  the  topological 
structure  of  cyclic  satellite  orbits  is  non-numerical.  It  is  possible  to  classify  types 
of  orbits  topologically^*  *.  (See  Plg.l). 

There  are  mathematical  areas  which  contain  no  numerical  components  as,  for  example, 
graph  and  network  theory.  With  the  help  of  these  theories,  complicated  problems  of 
strategy  can  be  analysed.  In  the  theory  of  games,  graphs  allow  an  intuitive  grasp  of  the 
problem  to  be  readily  achieved.  There  is  a  close  correspondence  between  graph  theory  and 
logic.  Graphs  can  also  be  described  in  terms  of  Boolean  matrices.  With  this,  a  link  to 
algebra  emerges.  A  special  topic  in  algebra  is  Group  Theory  irtiich  can  be  used  for  the 
investigation  of  graphs.  The  Importance  of  Group  Theory  is,  however,  much  greater  than 
this  and  more  general.  For  example,  one  can  with  the  help  of  groups,  describe  the 
symmetric  properties  of  elementary  particles. 

Many  problems  in  geometry  are  non-numerical  in  nature.  Hilbert's  research  on  the 
foundations  of  geometry  are  especially  worthy  of  note  here.  A  further  ir  lortant  subject 
in  mathematics  is  logic.  This  topic  is,  at  present,  being  very  actively  pursued.  It  is 
possible  to  prove  that  large  classes  of  problems  can  be  solved  without  direct  consideration 
of  the  individual  problems  through  the  utilisation  of  very  broad  logical  generalities. 

An  extention  of  this  idea  leads  to  the  new  topic  of  model  theory.  Modern  mathematical 
theory  is  becoming  ever  more  generalised  and  distant  from  the  classical  world  of  numerical 
analysis. 


2.  NON-NUMERICAL  DATA  PROCESSING 

The  field  of  non-numerical  data  processing  is  so  large,  if  one  regards  it  with  full 
generality,  that  no  list  of  subjects  can  be  exhaustive.  One  can  merely  cite  several  im¬ 
portant  new  efforts  without  prejudice  to  the  large  number  of  others  not  mentioned. 

2.1  Group  Theory 

It  is  possible  to  store  finite  groups  in  the  core  store  and  all  group  operations  can 
be  described  in  subroutines.  In  this  way  it  is  possible  to  manipulate  groups  in  computers. 
For  example,  all  sub-groups  can  be  determined  automatically  and  even  more  complicated 
problems  of  group  theory  can  be  solved  very  convienently^'  In  this  domain,  it  is 
certain  that  very  Interesting  new  results  will  be  obtained.  (See  Fig. 2). 

2.2  Games  Theory 

It  is  well  known  that  it  is  possible  to  program  a  computer  to  play  a  complex  game  like 
chess.  Since  the  theory  of  games  is  of  the  greatest  importance  for  scientific  management 
and  logistics,  game  playing  in  the  computer  becomes  a  very  serious  occupation  indeed. 

This  application  of  the  computer  will  be,  in  future,  one  of  the  most  important.  A  special 
possibility  is  the  construction  of  time-tables  by  computers’. 
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2.3  Translation 

Machine  translation  is  of  considerable  importance  due  to  the  continued  growth  of 
international  scientific  exchange. 

In  the  area  of  language  data  processing  the  German  "Forschungsgruppe  LIMAS” 
(Porschungsgruppe  Llnguistlk  und  maschlnelle  Sprachubersetzung)  has  shown  how  flow  dia¬ 
grams  can  be  used  for  the  purpose  of  machine  translation.  Here  language  is  viewed  as  a 
process,  l.e.  a  system,  in  which  information  is  converted  into  speech  signs.  This  process 
operates  on  three  main  levels. 

The  first  level  is  called  the  “nomo-sphere".  It  is  composed  of  “Inhalf’-factors 
((semantic  regulating  factors) (semantlsche  Steuerungsfactoren) )  which  alternatively 
Integrate  with  and  modify  one  another. 

The  second  level  is  a  parallel  system  which  portrays  the  "morpho-sphere”  of  the 
language.  The  above  portrayal  is  brought  about  by  “formale"  rather  than  ‘'Inhalt”-fact'' 
Linear  cohesions  or  single  factors  or  factor  gr  ups  seldom  occur  between  these  two  le 

Cohesive  relationship  between  levels  1  and  2  are  built  by  a  system  of  combinations  and 
ramifications.  This  level  is  called  the  "nomo-morpho  interaction  bridge”  (Hechselwirkwerk). 

All  cohesions  at  all  three  levels  -  the  morpho-sphere,  the  nomo-sphere  and  the  morpho- 
nomo  interaction  bridge  -  are  linguistically  determined  and  selected. 

A  structural  picture  of  such  a  system  of  relationship  at  the  morphospberic  level  is 
illustrated  in  the  accompanying  flow  diagram.  It  is  an  example  of  the  system  of  relations 
existing  between  all  three  levels.  The  range  between  these  two  levels  morpho-sphere  and 
nomo-sphere  is  the  nomo-morpho  Interaction  bridge.  This  structural  picture  appears  also 
in  all  subprograms. 

The  function  of  such  programs  is  reversible  and  the  nomo-factors  and  the  morpho-factors 
are  information  carriers. 

The  goal  of  this  system  is  to  portray  a  formal  image  of  the  communications  process 
called  “language,  i.e.  with  the  same  grammar  and  flow  system  synthesis  and  analysis  speech 
and  understanding  can  be  carried  on. 

Dr.  Hoppe  calls  bis  system  “Kommunikative  Grammatik”.  It  is  characterised  by  the 
following  principles; 

(a)  the  reversibility  of  functions 

(b)  the  information-retention  of  the  regulating  factors 

(c)  the  functioning  of  the  factors 

(d)  the  integrating  of  the  factors 

(e)  the  binary  structure  of  the  functions 

(f)  the  nomo-morpho  interaction  bridge 

(g)  determination 

(h)  selectivity 

(i)  the  operation  of  the  process  in  time. 

Such  a  non-numerical  treatment  of  data  belongs  to  the  area  of  data  processing,  not  to 
non-numerlcal  mathematics.  This  treatment  contains  a  working  theory  which  allows,  without 
the  help  of  logistical  functions,  the  generation  and  transformation  of  the  process,  the 


explication  of  factors  which  are  not  morphologically  represented  and  the  verbalisation  of 
these  factors,  that  is,  their  expression  in  grammatical  forms. 

in  this  way  a  system  for  the  direct  coordination  of  signifier  and  signified  is  replaced 
by  a  complex  system  of  functions  of  numerous  semantic  and  formal  regulatmg  factors. 

Only  when  language  is  treated  in  this  way,  as  a  regulated  process  and  as  a  system  of 
functions,  is  the  way  paved  for  high-quality  machine  translation. 

By  means  of  the  above  mentioned  factor  characteristics  (information  carrier,  binary 
functionality,  reversibility,  reciprocal  integration,  selectivity,  determination)  a  great 
number  of  the  so  far  unconquered  problems  can  be  solved  in  the  process  of  translating, 
problc' ..  among  which  the  ambiguities,  the  indefiniteness  and  the  implied  information  are 
the  most  important. 

Translation  connects  the  factor-process-system  of  two  languages  by  way  of  a  factor 
formula  which  contains  the  regulating  principle  for  each  of  the  sentences  to  be  translated, 
as  it  is  machine-analysed  in  the  input  language  and  as  required  for  synthesis  of  the 
target  language  in  its  regulating  process.  The  translation  is  in  this  case  equivalent  in 
meaning  (to  '•■he  original),  however  it  need  not  always  be  comparable  in  form,  that  is,  in 
its  syntax.  The  LIMAS-system  has  already  been  thoroughly  explained  in  a  number  of 
publications*'  (See  Pig. 3). 

A  Russian-German  translation  project  is  being  carried  out  in  collaboration  with  the 
Deutsche  Porschungsgemeinschaft  (German  Research  Association)  and  the  University  of 
Saarbriicken  (See  “Systran  System”  -  P.  Toma^^). 

2.4  Graph  Theory 

Very  complicated  graphs  can  be  stored  in  the  computer,  and  the  st:uoture  of  the  graph 
can  then  be  investigated.  It  is  possible,  for  example,  to  determine  the  shortest  connec¬ 
tion  between  two  nodes  of  the  graph.  Further,  cyclic  sub-graphs  can  be  discovered.  Such 

questions  are  of  the  greatest  practical  interest.  Techniques,  including  signal -flow 
graphs  and  k-trees  allow  one  to  obtain  a  clear  intuitive  picture  of  the  functioning  of  a 
linear  electrical  network,  after  which  analysis  is  much  easier.  Knowledge  of  the  graph 
in  topological  network  analysis  eliminates  many  time  consuming  mesh  and  node  calculations. 
In  the  chemical  industry,  the  flow  of  material  can  be  described  by  these  methods.  This 

is  of  considerable  importance  for  the  solution  of  management  problems  in  such  large 

factories,  A  large  German  chemical  factory  is,  at  present,  actively  using  this  technique 
in  its  daily  operation 

2.5  Pattern  Recognition  and  Enhancement 

The  recognition  of  shapes  is  a  very  difficult  and  interesting  problem  whose  solution 
has  many  applications.  The  problem  can  be  divided  into  two  parts.  One  is  the  decision 
as  to  which  class  out  of  a  large  number  of  possibilities  a  given  well  defined  pattern 
belongs  (character  recognition,  for  example).  The  other  is  concerned  with  improving  the 
definition  in  patterns  which  are  greatly  disturbed  by  extraneous  Influences  and,  given  a 
limited  number  of  classes,  deciding  into  which  such  patterns  most  probably  fall.  For 
example,  in  the  latter  category,  in  collaboration  with  the  Rheinisches  Landesmuseum, 

Labor  fiir  Peldarchaologie,  a  project  on  the  enhancement  of  buried  archaeological  monuments 
seen  in  the  results  of  surface  geophysical  measurement  is  in  progress.  Although  the 
method  requires  much  numerical  manipulation  of  the  data,  the  end  result  must  be  presented 
in  a  form  which  enhances  the  ability  of  the  human  eye  to  distinguish  faint  shapes  in  a 
noisy  field (See  Pig.  4). 

Form,  patten  structure,  logic  trees,  graphs,  language,  algebrpic  manipulation,  all 
of  these  are  b'  a  few  of  the  non-numerical  problems  which  are  yielding  to  the  attack  of 
non-numerical  mathematics  and  data  processing,  giving  new  results  in  areas  where  hitherto 
Insurmountable  difficulty  prevailed. 
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Figure  1.  Orbits  of  a  satellite  between  earth  and  moon 
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DISCUSSION 


Lustig:  I  would  like  to  raise  three  points: - 

(a)  Language  translation  does  not  seen  to  be  a  very  good  example  of  non-numerical 
mathematics. 

(b)  Non-numerical  mathematics  seems  to  have  a  very  limited  use  in  the  documentation 
field. 

(c)  There  seem  to  be  many  areas  of  mathematics  which  are  amenable  to  computer  solution 
but  this  does  not  seem  to  be  a  common  practice. 

P.Kriickeberg:  It  is  probably  true  that  the  use  of  non-numerical  mathematics  in  documen¬ 
tation  studies  is  limited  at  present  but  more  extensive  use  may  be  made  of  the  technitiue 
in  the  future. 

Regarding  the  solution  of  mathematical  problems,  there  is  plenty  of  scope  for  the 
application  of  computers;  at  Bonn  University  many  mathematical  problems  are  solved  in 
this  way  already. 

N.S. Prywes:  Can  you  comment  on  the  use  of  graph  theory  methods  for  simplifying  a  diagram 
of  a  thesaurus. 

F. KrUckeberg:  Established  theories  and  methods  exist  for  reducing  graphs  and  electrical 
networks  and  these  may  be  applicable  to  this  problem. 


SUMMARY 


The  organization,  functions  and  systens  used  at  TDCK  are  described. 
TDCK  collects,  evaluates  and  stores  Infomation  prinarlly  useful  for 
ailltary  purposes.  The  retrieval  systems  used  are  Universal  Decimal 
Classification  and  the  TDCK-Compact  System.  The  TDCK  Thesaurus  has 
been  designed  such  that  related  concepts  are  placed  on  concentric 
circles;  arrows,  fanning  out  in  all  directions,  are  used  to  display 
relationships  between  the  descriptors.  At  the  "input”  coding  will 
start  from  the  centre  of  the  circle,  following  an  arrow  until  the 
wanted  descriptor  is  reached.  A  total  of  376  circle-schemes  have 
been  designed  so  far. 


MANUAL  SYSTEMS  -  TDCK  CIRCULAR  THESAURUS  SYSTEM 


J.A.  Schuller 


1.  ORGANIZATION  OF  TDCK  (The  organization  is  shown  in  Fig.  1.) 

The  centre  falls  directly  under  the  Minister  of  Defence.  An  advisory  council  consisting 
of  four  members  advises  the  Minister,  at  his  request  or  on  their  own  initiative,  on  TDCK- 
policy  matters. 

Three  of  the  members  are  high-ranking  officers  of  the  Navy,  the  Army  and  the  Air  Force 
and  the  fourth  member  is  the  director  of  TDCK. 

TDCK  personnel  strength  is  63  people. 

Following  the  scheme  of  Fig. 1  we  see  at  the  left-hand-side  the  technical  divisions  and 
at  the  right-hand-side  the  special-library  department  and  the  administration  division. 

The  technical  division,  subdivided  in  sections,  is  manned  by  scientists  and  technical 
engineers,  in  total  25,  nineteen  of  whom  hold  an  academic  master's  degree,  while  the 
remainder  have  bachelor  degrees  or  are  senior  serving  officers. 


2.  FUNCTIONS 

The  primary  functions  of  TDCK  are: 

(a)  To  collect,  evaluate  and  store  new  scientific  information,  from  all  over  the  world, 
which  may  be  useful  for  military  purposes  in  general. 

(b)  To  be  well  Informed  of  highly  specialized  information-sources  in-and  outside  the 
country. 

(c)  To  use  available  information  for  giving  assistance  to  those  scientists,  technical 
investigators  and  officers,  who  are  involved  in  solving  problems  in  research, 
technology,  education,  management,  military  sciences  and  othqr  fields  of  military 
Interest. 

Before  explaining  how  we  try  to  fulfil  our  functions,  and  describing  some  details  of 
specific  activities,  I  should  like  to  stress  that  obviously  a  documentation  centre  is  a 
model  for  applied  efficiency.  Its  charge  is  to  use  available  information,  to  inhibit 
costly  duplication  and  to  select  the  most  effective  and  efficient  modem  methods  for 
achieving  these  aims. 

The  most  simple  and  direct  definition  of  documentation  which  has  come  my  way  lately 
rims  thus: 

"Documentation  embraces  the  logistics  of  knowledge" 

Whether  we  achieve  this  communication  or  transmission  of  information  by  handsorting 
activities  -  more  or  less  mechanized  -  or  by  a  computer,  does  not  affect  the  functions  of 
a  documentation  centre. 
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The  first-mentioned  function,  collecting  new  scientific  and  technical  information, 
includes  the  evaluating  factor  and  both  are  dependent  on  the  fields  of  Interest,  and  the 
degree  of  specialization  within  these  fields,  of  the  institution  for  which  one  is  working. 

It  may  be  clear  that  TDCK’s  charge:  “for  military  purposes  in  general”  encompasses  a  very 

broad  area,  and  I  refiir  again  to  Pig.  1  where  this  coverage  is  indicated. 

Collecting  scientific  and  technical  data  useful  for  defence  purposes  is  one  of  the 
intricate  activities  of  the  centre.  Firstly  one  should  know  what  is  wanted;  secondly  what 
is  to  be  had  and  where. 

After  some  15  years  TOOK  has  received  reports  from  more  than  6000  different  research 
institutes  working  for  defence  and  spread  over  the  15  NATO-n»»<-lons.  These  Include  all  types 
of  report -producing  institutions;  e.g.  laboratories  of  research  establishments,  of 
universities  and  of  industries.  In  certain  cases  TDCK  has  succeeded  in  being  put  on 

mailing  lists,  which  is  exceptional  for  a  foreign  centre.  Other  institutions  send  their 

accession  lists  from  which  reports  may  be  requested.  In  many  instances  TDCK  receives 
technical  reports  without  cost,  but  often  also  in  exchange  for  TDCK  publications,  while 
other  series  are  made  available  to  TOCK  against  reproduction  costs.  The  bulk  of  the  reports 
acquired  by  TDCK  are  received  from  the  United  States,  but  significant  contributions  come 
from  the  UK,  Canada,  W-Germany  and  AGARD.  TDCK  maintains  close  contacts  with  the  national 
defence  documentation  centres  in  NATO  countries;  contacts  which  are  encouraged  by  the  AGARD 
Technical  Information  Panel  membership. 

A  network  has  been  constructed  which  actually  connects  the  information-centres  in  NATO 
and  occasionally  in  some  neutral  friendly  countries.  At  the  same  time  direct  contacts  are 
maintained  with  several  special  institutions  in  some  of  these  countries  (see  Function  2(b)) 
resulting  in  a  regular  exchange  of  reports. 

In  order  to  minimize  duplication  of  research,  TDCK  will  buy  keys  the  research  litera¬ 
ture  where  possible.  It  subscribes  to  “Physics  Abstracts",  to  "Elnvironmental  Effects  on 
Material  and  Equipment  Abstracts”  to  “Excerpta  Medica”  to  “Index  Aeronauticus”  to 
“Meteorological  and  Geoastrophyslcal  Abstracts”,  etc.  In  total  we  have  over  30  different 
subscriptions  of  this  kind.  These  abstracts  are  considered  to  be  the  backbone  of  our 
information  sources,  and  only  supplementary  work  is  needed,  keeping  in  mind  that  many  of 
these  abstracting  services  are  not  up  to  date  -  running  behind  from,  say,  three  months  to 
two  years  -  and  moreover  do  not  cover  all  publications  e.g.  symposia-papers,  patents  and 
unpublished  reports. 

Interesting  new  articles  and  many  unpublished  documents  culled  from  many  sources  are 
selected,  abstracted,  and  published  in  our  monthly  literature  digests.  Each  scientific 
section  at  TDCK  composes  its  own  digest  so  that  we  publish  20  different  literature  digests 
per  month,  three  of  which  are  issued  every  two  weeks  (namely  "Electronics",  “Aeronautics”, 
and  "Economics”).  Perhaps  I  should  mention  also  that  in  some  subject  area?  we  are  working 
in  co-operation  with  other  documentation  centres,  in  so-called  "Pools”  preparing  abstracts 
for  common  use;  our  unclassified  digests  are  also  circulated  to  interested  parties  outside 
defence. 

All  reference  material  published  in  our  literature  digests  is  entered  in  our  indexing 
and  retrieval  systems  and  is  available  in  hardcopy.  For  defining  the  contents  of  the 
literature  and  for  specialized  retrieval  search  TDCK  is  using  two  different  systems;  namely 
the  UDC  (Universal  Decimal  Classification)  and  the  TDCK-Compact  System. 

Apart  from  these  systems  an  index  is  kept  for  retrieving  reports,  papers  etc.  according 
to  their  Issuing  organizations;  an  institute,  a  laboratory,  etc. 

In  some  cases  the  information  scientist,  or  the  questioner,  is  aware  of  specific  long¬ 
term  research  projects  which  are  undertaken  by  one  or  more  well  known  laboratories.  In  such 
cases  he  may  find  useful  data  at  short  notice  in  this  index.  For  example  the  answer  to  a 
question  concerning;  “theory  on  3-4  or  5  bladed  supercavitating  propeller  performance”,  may 
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be  quickly  found  under  TMb  (Taylor  Model  Basin  Wash. DC),  since  it  is  known  that  TMB  is  work¬ 
ing  in  this  field. 

Part  of  this  retrieval  activity  is  mechanized  by  a  Lectriever,  which  is  a  mechanically 
driven  file  selector. 

Now  I  propose  to  turn  to  figure  2.  Here  is  shown  our  Table  to  Sources  of  Information,  a 
list  which  is  Intended  to  remind  our  Information  officers  of  all  sources  available  at  TDCK 
which  should  be  consulted  when,  for  Instance,  a  selective  bibliography  has  to  be  compiled. 

Along  the  top  are  found  (vertically  printed)  the  different  scientific  and  technical  areas 
which  are  of  Interest  to  the  M. O.D. ;  l.e.  the  same  broad  subject  areas  as  shown  in  Fig. 1. 

In  the  column  at  the  left  we  distinguish  three  different  groups  of  information  sources: 

1.  Card  Catalogue  Systems 

2.  Books  of  Reference 

3.  Abstracts  Journals. 

If,  for  example,  a  request  for  a  bibliography  on  "Inertial  Navigation  Systems"  has  been 
requested,  18  different  sources  will  have  to  be  consulted  according  to  this  schedule  all  of 
them  packed  with  information  and  all  of  them  using  their  own  indexing  system.  This  last 
remark  suggests  one  rf  the  reasons  why  TDC3C  does  not  feel  like  using  a  computer  for  its  own 
system;  some  35  other  systems  would  still  have  to  be  scanned  in  a  conventional  manner. 

Coming  to  the  third  function  of  TDCK  we  arrive  at  the  main  objective  for  which  a  defence 
documentation  centre  is  established,  namely  to  provide  Information  to  those  who  need-to'know. 
Technical  questions  which  are  put  to  TDCK  are  bandied  according  to  the  needs  and  the  wishes 
of  the  questioner  only,  of  course,  as  long  as  IDCK  is  able  to  provide  specialists  time  and 
material.  In  principle  a  question  can  be  answered  in  four  different  ways: 

(a)  By  making  available  a  selection  of  reports  or  articles  in  which  the  problem  has  been 
treated. 

If  the  questioner  knows  the  title  of  such  reports  the  request  is  a  very  easy  one  to 
handle  and  is  limited  to  routine  library  action;  if  not,  the  centre  !'is  to  find  the 
required  Information  by  one  of  its  retrieval  systems. 

(b)  By  making  available  a  biblii  ;rapby  consisting  of  titles  and  descriptive  abstracts  of 
all  available  printed  information,  ail  well  indexed  and  cross-referenced. 

When  such  a  bibliography  has  to  be  compiled,  all  available  sources  in  TDCK  have  to 
be  consulted  (see  Fig. 2. ) 

(c)  Through  the  production  of  a  literature  research  report  or  a  “state  of  the  art”  review, 
in  which  the  information  scientist  provides  a  survey  of  the  latest  developments  in 
the  relevant  subject  area.  Requests  for  such  studies  arrive  more  and  more  frequently. 

The  Jnformatlon  centre  of  today  is  already  manned  by  a  university  trained  staff  of 
engineers  and  doctors  with  linguistic  abilities,  for  selecting  the  literature,  for 
making  descriptive  abstracts  and  for  classifying. 

It  is  a  very  attractive  part  of  the  literature  analyst's  job  to  make  literature 
searches,  and  is  undoubtedly  a  highly  responsible  scientific  task  which  -  in  my 
opinion  -  will  never  be  accomplished  by  a  computer. 

Of  course  these  extensive  special  studies  can  only  be  made  when  enough  time  is 
available. 

(d)  By  offering  research  workers  a  chance  for  interaction  with  the  literature  in  their 
specific  narrow  field  of  interest.  In  other  words  providing  facilities  which  permit 
effective  browsing;  in  my  opinion  a  very  necessary  activity. 


The  number  of  complex  questions  which  entailed  much  work  exceeded  700  In  1967.  Less 
Intricate  questions  amounted  to  about  1400,  while  requests  for  copies  of  specific  re¬ 
ports  or  articles  totalled  more  than  50000  during  last  year. 


3.  SYSTEMS 

Coming  to  the  systems  which  TDCK  uses  for  Indexing  and  retrieving  the  literature  by 
subject,  I  should  like  to  give  you  an  idea  of  the  philosophy  of  TDCK’s  so-called  Compact 
System. 

Before  doing  so  I  have  to  try  to  convince  you  that  a  documentation  and  Information  centre 
Is  constantly  confronted  with  very  specific  difficulties.  A  well  known  example  is  that  a 
visiting  scientist  research  worker  frequently  does  not  know  what  he  is  actually  looking  for, 
and  finds  himself  imable  to  formulate  his  Information  need  clearly.  The  only  thing  we  can 
do  in  such  a  case  Is  to  confront  him  with  a  scientist  from  our  l  :aff,  a  colleague  who  under¬ 
stands  his  language  (his  Jargon),  If  not  his  problem,  and  who  is  able  to  lead  the  enquirer 
to  a  manual  system  In  which  he  can  browse.  The  big  questions  in  information  systems  -  on 
the  documentation  side  -  are  always  “will  we  get  out  all  that  we  have  put  in?*’  and,  "did  we 
put  in  properly  what  we  have?” 

For  many  years  TDCK  has  usea  two  different  systems  for  retrieval  and,  to  anticipate  a 
logical  question,  I  want  to  stress  that  there  is  no  system  in  the  world  today  which  will 
give  a  100  per  cent  output,  T>ie  application  of  two  systems  which  differ  fundamentally  in 
their  nature  and  philosophy,  puts  at  our  command  the  sum  of  the  possibilities  inherent  in 
each  of  the  two  systems. 

Of  course  this  means  too  that  more  work  has  to  be  done. 

Experience,  however,  indicates  that  this  ‘)nore  work”  has  to  be  done  on  the  input  side  and 
that,  in  most  cases,  one  will  meet  with  less  work  and  certainly  more  completeness  on  the 
output  side.  Moreover,  when  handling  two  systems  for  indexing  all  documents,  it  is  possible 
to  pose  identical  questions  to  both  systems  and  to  learn  why  a  certain  document  was  not 
retrieved  by  one  of  the  two  systems.  In  this  way  research  will  pinpoint  the  weaknesses  of 
both  systems. 

The  results  of  such  research  has  led  in  our  case,  to  the  design  of  a  new  system  which 
should  replace  the  unlterm-system  of  coordinate  indexing,  which  has  shown  some  serious 
deficiencies. 

This  new  system  is  called  the  TDCK  Circular  Thesaurus  System.  It  was  considered  that 
certain  features  of  several  well-known  systems  are  very  useful  and,  when  possible,  should  be 
incorporated  in  our  new  thesaurus  conception. 

most  of  the  ideas  used  in  the  TDCK-system  are  not  new  at  all.  In  fact,  what  is  new 
is  that,  if  properly  used,  the  visible  display  of  a  systematically  built  thesaurus  compels  a 
person  to  retrieve  with  the  same  terms  as  used  at  the  input  activity.  This  aspect  is 
perhaps  the  most  significant  one  of  the  TDCK  manual  system;  the  graphical  display  of  the 
scientific  sub-divisions  of  a  discipline  introduces  a  third  dimension  to  the  thesaurus. 

Ifhen  designing  this  thesaurus  we  consciously  sought  to  obtain  a  combination  of; 

1.  a  systematic  subject  set  up; 

2.  alphabetical  arrangement  of  descriptors; 

3.  coordinate  indexing  principle; 

4.  mutual  relations  and  facets;  and  finally 

5.  visible  directions  display. 
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We  believe  that  in  the  new  IDCK  Circular  Thesaurus  method  such  a  combination  has  been 
achieved. 

I  shall  now  try  to  give  a  description  of  what  we  have  called  a  simple  circle-scheme. 

Notions  which  are  related  to  each  other,  and  which  can  be  placed  in  the  familiar  family- 
tree  pyramid,  are  placed  on  concentric  circles;  that  is  to  say  we  have  simply  made  a  plan- 
view  of  our  tree  because  on  a  circle  we  have,  in  practice,  about  five  times  more  room  than 
in  a  pyramid  structure.  Arrows,  fanning  out  in  all  directions  are  used  to  display  relation¬ 
ships  between  concepts.  An  example  of  a  circle  scheme  is  shown  in  Fig. 3;  in  use  a  specific 
procedure  is  practised  -  this  is  described  in  some  detail  below.  Briefly  when  arrows  are 
followed  from  the  origin  of  a  circle-scheme  we  see  how  notions  fall  into  logical  sub¬ 
divisions.  A  relationship  is  sought  which,  depending  on  needs  can  be  continued  on  a 
following  concentric  circle,  and  so  on. 

When  descriptors  from  other  circle-schemes  have  to  be  used  for  describing  a  document 
properly,  such  notions  can  be  “borrowed”,  but  should  immediately  be  added  (in  writing)  to 
the  circle-scheme  of  current  interest,  but  outside  the  circle.  All  such  “complemented" 
circle-schemes  are  formally  published. 

The  individual  circle-schemes  which  have  been  built  up  are  thus  kept  limited,  and  arrows 
will  refer  us  to  other  circles  when  we  are  entering  their  domain.  It  may  be  observed 
also  in  Pig. 3  that  in  all  cases  these  arrows  point  to  “borrowed"  descriptors  which  are  not 
framed.  The  word  at  the  centre  of  each  circle-scheme  is  usually  a  descriptor  with  a  high 
frequency  count  in  the  system. 

The  following  rules  are  in  force:, 

1,  The  thesaurus  consists  of  descriptors; 

2.,  Each  framed  descriptor  appears  only  once  in  the  system; 

3.  At  the  input,  coding  will  start  from  the  centre  of  a  circle,  following  an  arrow  until  the 
wanted  descriptor  has  been  reached.  In  one  circle,  more  than  one  radius  may  be  followed; 

4.  All  descriptors  encountered  will  be  noteu  down. 

When  a  descriptor  from  another  circle  has  to  be  used,  we  can  “borrow"  such  a  notion  and 
add  it  to  our  circle-scheme,  outside  the  ultimate  circle,  in  which  case  we  do  not  use  a 
frame. 

5.  New  descriptors  have  to  be  defined  by  the  subject-specialist  concerned.  They  will  be 
added  officially  to  one  of  his  circle-schemes  after  which  they  can  be  “borrowed”  by  other 
circle-schemes; 

6.  On  the  retrieval  side  the  circle-schemes  will  always  be  used.  The  user  (a  specialist  in 
the  field)  will  be  led  automatically  to  the  pertinent  descriptor^,  once  the  appropriate 
scheme  has  been  selected. 

Only  two  or  three  descriptors  are  necessary  fur  defining  the  question,  in  other  words: 
a  document  coded  by  20  descriptors  can  give  an  answer  to  10  different  questions. 

7.  The  thesaurus,  as  a  one-language  technical  index,  can  be  translated. 

The  use  of  homonyms  and  synonyms  is  avoided.  The  word  “measurement”,  for  example,  will 
be  used  in  several  descriptors,  which  could  read:  measurement  of  time,  measurement  of 
distance,  ballistic  measurement,  etc.  In  all,  376  circle-schemes  have  been  designed  to 
date. 

The  number  of  descriptors  for  the  TOCTC  fields  of  interest  (defence)  is  about  11,500.  It 
is  expected  that  this  number  will  gradually  increase  to  no  more  than  12,u00.  Accessions  of 
essential  descriptors  aie  reported  to  the  System-Manager,  who  publishes,  usually  every  year, 
up-dated  schemes  to  replace  old  ones.  The  fourth  edition  of  the  TDCX  Circular  Thesaurus  was 
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published  in  1966,  the  fifth  edition  is  in  print  at  this  moitent;  it  will  contain  almost  400 
separate  circle-schemes.  It  is  easy  to  re-arrange  any  scheme,  if  this  is  desirable  or 
necessary,  as  for  example  when  the  philosophy  in  a  particular  scientific  field  is  subject  to 
change.  The  sequence  of  the  descriptors  along  one  radius,  however,  is  never  subject  to 
change. 

Suimarizlng:  Hierarchically  related  thesaurus  terms  are  arranged  within  a  series  of 
conce  'ic  circles,  with  the  most  geuerlc  term  at  the  origin.  Arrows  radiate  outward  to 
spe''iflc  terms  on  the  first  circle  and  from  some  of  these  terms  to  successively  more 
specific  terms  on  succeeding  circles. 

Pig.  4  is  taken  from  page  100  of  the  Thesaurus  where  the  division  Operational- Re  search 
has  been  subdivided  in  15  so-called  “descriptor  fields”.. 


If,  for  example,  we  receive  a  document  in  which  the  aspects  of  a  tactical  a.^r  defence 
O.R.  game  are  discussed  it  will  of  course  be  passed  to  our  O.R.  division  for  abstracting 
ana  indexing.  Here  the  proper  descriptor  field  will  be  chosen  (Pig. 4).  Our  specialist 
will  turn  to  scheme  112  Operational  Game  |  and  on  this  page  the  descriptor  field  shows  a 


plan  view  of  the  subject  [Operational  gamej  (Pig.;i) 


Now,  starting  from  the  centre,  an  arrow  is  followed  downward  to  mactical  game 


to 


further 


defensive  game  | ,-  and  from  there  to  the  descriptor  air  defence  which  has  been  “borrowed” 


from  another  circle-scheme  (scheme  52)  '  nd  hunce  has  not  been  framed. 


If,  in  the  same  report,  tuj  subject  of  a  [reconnaissance  gamej  is  treated  this  descriptor 
will  be  added  also. 


Scanning  along  the  second  circle  the  aescriptor  [air  game  will  be  noted  down  as  well 


When  this  has  been  done  the  report  has  been  defined  by  6  descriptors. 

If  at  a  later  stage  a  question  related  this  subject  is  received  then  only  two  of  the 


and 


tactical  game 


assigned  descriptors  would  bring  this  report  forward.  For  instance  the  cades  for  jair  game 
would  suffice. 


The  visual  display  of  the  descriptors  compels  the  use  of  the  prescribed  thesaurus  terms  at 
the  output  as  well  os -at  the  input  activity. 


The  discuision  on  this  paper  follows  on  page  llO. 
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OPERATIONAL  GAME 
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Figure  3 
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OPERATIONAL  RESEARCH 

100  MATHEMATICAL  STATISTICS 

104  OPERATIONAL  RESEARCH 
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no  GUE  THE(»Y  (MONTE  CARLO) 

112  (X>ERATI(»4AL  GAME 

114  EFFORT  DISTRIBUTION 

116  INVENTORY  CONTROL 
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122  PROBABILITY  CALCULUS 

123  HITTING  PROBABILITY 

125  SYSTEMS  ANALYSIS 

129  AVIATION  MATHESIATICS 

135  MEDICAL  OPERATIONAL  RESEARCH 

Figure  4 


DISCUSSION 


H. A.Stolk;  Why  does  TOCK  use  a  letter -number- letter  code  instead  of  a  purely  numerical 
coding  system? 

J. A. Schuller:  The  use  of  thir-  notation  makes  it  possible  for  us  to  post  a  total  of  12,500 
descriptors  oa  1,000  cards.. 


C.O.Vernimb:  What  is  the  annual  input  of  documents  into  your  system? 
J. A. Schuller:  The  input  is  40,000  to  50,000  documents  per  year. 


S.C. Schuler:  (a)  Do  you  find  it  practicable  using  your  manual  system,  to  send  copies  of 

abstracts  direct  to  groups  of  scientists  oa  an  SDI  basis’ 

(b)  Do  you  use  microfiche  as  a  means  of  sending  out  documents’  What  is 
the  reaction  f  users’ 


J, A. Schuller:  (a)  We  do  not  have  £.n  SDI  system  but  we  do  send  information  to  workers  on  a 
continuing  basis  if  we  know  they  are  interested.  One  disadvantage  of 
this  is  that  they  neglect  to  let  us  know  when  their  interest  in  the 
subject  ceases., 

(b)  We  do  have  many  microfiche  but  we  do  not  make  very  much  use  of  them 

because  of  lack  of  good  reading  equipment.  Users  still  prefer  hard  copy 
reports. 
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The  view  Is  put  forward  that  the  handling  of  large  document  files 
requires  mechanization  and  that  even  processes  such  as  document  analysis 
for  input,  question  analysis  for  retrieval  and  retrieval  result  evaluation, 
must  eventually  succomb  to  machine  treatment.  Keypunching  of  computer 
input  presents  particular  problems.  The  solution  could  be  optical  scanning 
if  standardized  print  formats  were  used  in  document  production.  The  direct 
Interrogation  of  the  machine  file  by  remote  visual  display  consoles  is  an 
Inevitable  development.  The  ESRO/BLDO  Documentation  Service  hopes  to  have 
a  system  available  in  Europe  early  in  1969..  Ultimately  such  consoles  would 
be  Installed  at  strategic  points  throughout  the  European  network  of  ESRO 
establishments. 
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MECHANICAL  SYSTEMS 
N.E.C.  Isotta 


1.  INTRODUCTION 

Until  not  very  long  ago,  perhaps  only  about  four  or  five  years,  the  authors  of  most 
papers  on  mechanised  systems  of  information  retrieval,  would  be  mainly  concerned  with  one 
of  two  things,  and  these  were  in  very  general  terms,  either  tie  necessity  for  the  justi¬ 
fication  of  the  commencement  of  a  machine  system,  or  an  attempt  to  prove  that  having 
started  such  a  system,  the  results  were  worthwhile  or  at  least  as  good  as  expected.  Nowa¬ 
days,  the  atmosphere  in  these  matters  is  rather  different  since  most  people,  and  by  this 
I  mean  both  the  customer  and  the  supplier,  have  realised  and  accepted  the  neecj  for 
mechanical  methods  of  handling  large  files*  However,  there  is  considerable  divergence  of 
views  on  the  level  at  which  mechanisation  really  becomes  necessary.  Probably  for  report 
literature  the  figure  could  be  as  low  as  100, 000  items.  But  in  actual  fact  this  is  also 
partly  a  function  of  the  slow  development  of  any  form  of  any  standardised  vocabulary. 
Since  scientific  and  technical  development  always  produces  a  corresponding  vocabulary,  it 
becomes  essential  tnat  e.xistlng  systems  should  lend  themselves  to  adaptation  without  a 
large  amount  of  manual  effort.  Machine  systems  with  built  in  feedback  principles  are 
obviously  necessary  if  we  are  easily  to  keep  up  with  developing  technologies. 

For  non -in format ion  conscious  administrations,  the  "subjective  threshold  of  acceptance 
for  a  machine  system"  depends  to  a  large  extent  on  existing  familiarity  with  large  bodies 
of  material,  or  large  numbers  of  items  of  any  kind.  For  example,  a  motor  manufacturer 
used  to  large  quantities  of  stocks  of  spare  parts  of  100,000  different  items  would 
probably  not  be  convinced  of  the  necessity  for  mechanisation  for  a  simple  reports  field 
until  the  store  could  be  described  in  terms  of  "nearly  a  quarter  of  a  million",  l.e. 
something  over  200,000.  On  the  other  hand  a  manufacturer  of  nuts  and  bolts  might  well  be 
convinced  at  "almost  100, 000  items”.  Pressure  from  the  potential  user  is  rarely  strong 
enough,  or  well  organised  enough,  to  affect  the  situation. 


2.  THE  MACHINE  VERSUS  THE  PROFESSIONAL 

Eventually  a  certain  amount  of  time  is  usually  allocated  on  a  computer  which  is 
primarily  intended  for  other  purposes,  e.g.  payroll,  stock  control,  or  as  in  our  case, 
scientific  data  processing.  It  is  here  very  often  that  the  first  troubles  begin,  parti¬ 
cularly  if  use  of  the  system  is  sufficient  to  demand  real  time  operation.  There  is 
certainly  complete  acceptance  of  the  fact  now,  that  documentary  processes  are  particularly 
amenable  to  mechanisation  in  what  are  often  known  as  "business  activity"  areas.  Such 
areas  may  be  fairly  easily  defined;  they  include  operations  which  can  theoretically  be 
performed  without  the  direct  intervention  of  professional  labour,  ei'en  though  such  labour 
may  have  been  necessary  initially  to  establish  the  operational  procedures.  These  will 
include  such  matters  as  stock  control,,  catalogue  or  index  printing,,  preparation  of 
announcement  journals  or  accession  lists,  the  establishment  of  "field  of  interest" 
registers  etc.  The  most  important  activities  remaining  which  still  require  professional 
attention  are  therefore  document  analysis  for  input,  question  analysis  for  retrieval,  and 
retrieval  result  evaluation.  Speaking  rather  heretically,  primarily  as  a  documental ist, 
and  not  as  a  user,  it  seems  to  me  however  inevitable,  that  even  these  areas  must  eventually 
succumb  to  machine  treatment,,  simply  because  of  tht  sheer  weight  of  material  involved,  in 
conjunction  with  the  increased  effectiveness  of  the  machine  systems  available. 
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3.  MACHINE  INPUT  AND  COOPERATIVE  SYSTEMS 

The  main  problem  area  for  many  years  has  been  the  one  of  getting  the  documentary 
material  into  the  machine.  Keyboarding  in  one  form  or  another  has  remained  essential. 
Technically  speaking  it  has  been  possible  for  some  time  to  arrange  for  input  to  be  made 
directly  to  a  computer  without  the  necessity  for  such  keyboarding  operations.  Economically 
howevp  such  systems,  which  are  normally  accompanied  by  very  large  workload  capacities, 
have  not  been  justifiable  in  circumstances  where  the  capacity  would  never  be  fully  taken 
up.  This  is  therefore  still  one  of  the  most  expensive,  time  consuming,  and  error  ridden 
parts  of  an  integrated  machine  system.  Optical  scanning  could  be  a  solution  if  standard¬ 
ised  print  formats  were  used  in  document  production,  in  order  to  avoid  an  intermediate 
keyboarding  into  such  a  standard  type  script. 

The  ESRO/ELDO  Space  Documentation  Service  based  on  an  exchange  arrangement  with  NASA, 
is  one  of  the  first  ventures  of  the  kind  where  the  agency  receiving  the  machine  system 
i.e.  ESRO/ELDO,  is  also  responsible  for  the  provision  of  machine  input  to  the  system 
operated  by  the  supplying  agency  i.e.  NASA.  This  has  underlined  the  problems  mentioned 
above  and  has  certainly  indicated  the  enormous  advantages  which  could  be  gained  if 
greater  standardisation  in  this  area  could  be  achieved.  In  spite  of  the  difficulties 
involved,  however,  material  now  being  processed  by  ESRO  in  Paris,  according  to  standard 
NASA  procedures  is  about  to  be  fed,  through  the  medium  of  punched  paper  tape,  directly  to 
the  computer  system  responsible  for  the  Photon  production  of  the  NASA  STAR  journal.  I 
may  say,  that  there  is  great  satisfaction  in  both  the  NASA  and  ESRO  centres  at  the 
successful  outcome  of  this  operation,  which  has  been  made  possible  only  by  great  patience 
and  understanding  on  both  sides.  As  part  of  the  exchange  arrangement  NASA  has  generously 
made  available  to  ESRO/ELDO  its  total  machine  system  together  with  the  relevant  file  of 
information  on  magnetic  tape.  In  addition,  microfiche  of  a  large  number  of  the  items 
quoted  on  the  file  are  also  provided.  The  service  thus  provided  by  ESRO/ELDO  is  available 
to  both  ESRO/ELDO  staffs  and  to  authorised  users  in  Member  States,  and  members  of  Eurospace. 


4.  MACHINE  OUTPUT  AND  USER  REACTION 

It  is  clear  that  in  spite  of  the  advanced  computer  age  in  which  we  live,  there  has  been 
a  general  'liminution  of  standards  of  production  resulting  from  the  use  of  computers,  and 
many  of  the  users  of  machine  document  systems  are  accepting  this  with  reluctance.  The 
computer  manufacturer’s  philosophy  until  quite  recently  has  been  that  the  advantages  in¬ 
herent  in  machine  processing  in  respect  of  time  saving,  and  capacity,  have  outweighed  any 
disadvantages  apparent  in  the  final  machine  product.  In  my  view,  they  have  been  totally 
wrong.  The  manufacturers  of  such  things  as  detergents  can  teach  the  computer  manufacturer 
a  great  deal  concerning  "eye  appeal"  and  "packaging".  There  are  known  cases  where  the 
cost  of  the  package  is  greater  than  the  cost  of  the  contents:  one  specific  example  outside 
the  detergent  field,  is  the  can  of  water  supplied  on  certain  European  flights.  Evsn  now 
upper  and  lower  case  computer  output  is  a  rarity  and  is  often  associated  with  some  other 
extremely  expensive  off-lire  printing  machine.  However,  by  now,  the  user  too  should  have 
become  somewhat  more  sophisticated  in  his  reaction  to  the  current  standards  of  computer 
printout.  He  should  make  the  best  of  what  is  available  since  it  is  a  retrograde  step  to 
interpose  between  the  computer  output  and  the  users,  some  intermediate  manual  stage,  be 
it  editing  or  the  improvement  of  appearance  of  the  output  by  some  other  printing  or  repro¬ 
duction  process.  I  feel  sure  that  what  must  be  earned  at,  is  a  completely  satisfactory 
direct  computer  output;  but  certainly,  in  the  meantime,  the  user  must  overcome  his 
prejudices,  although  at  the  same  time  he  should  be  sufficiently  vocal  to  indicate  that  the 
result  is  not  really  pretty  enough  to  encourage  him  to  make  the  greatest  use  of  it. 

The  question  arises  as  to  the  best  method  of  placing  the  user  in  contact  with  the  body 
of  Information  available.  In  our  case,  apart  of  course  from  our  own  staffs,  the  contact 
is  through  the  medium  of  correspondence  and  telephonic  comLunication  with  the  documentalist 
who  is  to  pose  the  question  to  the  computer.  Contact  is  thus  to  a  large  extent  remote. 

Our  experience  with  our  own  staff  shows  that  iu  this  respect  it  is  difficult  to  match  the 
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refults  achieved  by  personal  interview  between  the  user  and  the  document alist.  Is  there 
therefore  a  substitute  for  such  personal  contact?  Almost  certainly  the  answer  must  be 
direct  interviews  with  the  computer  itself. 


5,  DIRECT  USER  ACCESS  TO  THE  MACHINE 

I  have  no  doubt  that  the  Orwell  1984  concept  of  a  “Big  Brother”  machine  is  quite 
possible  within  the  time  available  between  now  and  then.  Such  a  machine  would  almost 
certainly  be  capable  of  a  wide  variety  of  jobs,  medical  diagnosis  being  but  one  example 
which  springs  to  mind.  Such  operations,  however,  could  only  be  carried  out  on  a  govern¬ 
mental  basis  (hence  the  Orwell  concept)  with  users  subscribing  to  the  terminal  equipment 
just  as  they  now  do  for  their  telephones  Prom  an  individual  organisation's  point  of 
view  however,  there  could  be  distinct  advantages  in  having  smaller,  cheaper  private 
machines  -  and  there  would  also  certainly  be  a  commercial  interest  for  the  computer  manu¬ 
facturer  in  providing  such  machines.  In  the  end  of  course,  someone  will  also  consider 
what  the  user  himself  would  like. 

A  habit  which  is,  I  think,  engrained  in  most  of  us  after  centuries  of  the  existence  of 
libraries,  is  that  of  browsing.  This  is  something  that  the  machine  has  been  tending  to 
deprive  us  of,  since  somehow,  wading  through  a  computer  listing  is  not  quite  the  same 
thing  as  browsing  through  a  shelf  of  books.  It  is  now  possible  however,  to  approach  a 
similar  situation  by  means  of  direct  Interrogation  of  the  machine  file  using  a  remote 
visual  display  console.  The  ESRO/ELDO  Space  Documentation  Service  hopes  to  have  such  a 
capacity  available  initially  for  its  own  analsrtical  staff,  early  in  1969,  closely  following 
a  NASA  lead.  Ultimately  such  consoles  would  be  installed  at  strategic  points  throughout 
the  European  network  of  ESRO  establishments,  thus  enabling  the  user  to  go  direct  to  the 
machine  as  and  when  he  feels  like  it.  For  some  time  it  has,  I  think,  been  apparent  that 
future  development  would  be  in  this  direction.  It  is  essential  that  the  future  of  machine 
information  retrieval  is  not  designed  around  the  capabilities  of  the  first  and  second 
generation  computers  with  which  the  technique  was  born.  Joint  effort  on  the  part  of  the 
supplier,  i.e.  the  documentalist,  and  on  the  part  of  the  user  should  soon  achieve  the 
desired  result. 


DISCUSSION 


H.A. Stolk;  What  services  does  ESRO  documentation  unit  provide  and  to  whom  is  it  provided? 

N.  E. C. Isotta:  The  unit  can  search  back  in  files  dating  to  1962,  covering  300, 000  or  more 
references,  in  subject  searches.  It  provides  an  SDI  service  on  Individually  constructed 
profiles  but  intends  to  transfer  to  standard  profiles  soon  as  this  gives  a  very  much 
cheaper  service.  The  service  is  provided  to  ESRO  and  EIDO  staff,  members  of  Eurospace 
and  to  authorised  users  in  member  states. 


J. R.  C.  Licklider:  I  cannot  understand  how  you  get  "immediate”  indication  in  a  mechanized 
system  working  in  the  conversational  mode.  Take  the  example  that  you  have  10^  documents 
and  10^  descriptors,  and  that  a  typical  retrieval  attempt  is  specified  by  6  descriptors. 
Also  assume  thirty  users  in  a  multi-access  interactive  system.  If  you  stored  with  every 
pattern  of  six  descriptors,  the  number  of  patterns  associated  with  it,  there  would  be 
about  lO  ’***  =  10^®  items  in  the  file  and  that  would  not  be  reasonable.  If  you  stored 
with  each  descriptor  the  identification  of  all  the  documents  associated  with  it  (1,000, 
or  perhaps  10,000  or  100,000)  you  would  have  to  transfer  data  from  a  slew  secondary  to  e. 
fast  primary  memory  six  times  and  then  evaluate  the  Boolean  expression.  The  waiting  time 
would  be  15  to  30  seconds.  Is  the  key  to  limit  the  size  of  the  file  to  say,  100  items? 
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N.  E.  C.  Isotta:  I  cannot  explain  how  the  system  works  but  I  have  seen  It  working  at 
Lockheed  Corporation  in  San  Francisco  and  at  the  NASA  facility  in  Washington. 


C.  D.  Vernimb:  Judging  by  experience  of  rejection  by  users,  how  many  irrelevant  documents 
are  they  prepared  to  accept  in  the  results  of  a  search’ 

N.  E.  C.  Isotta;  Users  vary  tremendously  over  the  amount  of  material  they  are  prepared  to 
look  at.  It  is  difficult  to  be  very  definite  on  standards  of  precision  as  this  depends 
too  much  on  the  individual  user.; 


PAPER  11 


AN  INTRODUCTION  TO  THE  STUDY  OP 
COST  EFFECTIVENESS  IN  INFORMATION  SYSTEMS 


by 


Professor  J.N.  Wolfe 


Edinburgh  University,  UK 


117 


SUMMARY 


Observations  on  the  natuis  of  cost  effectiveness  studies  in  general 
are  nade  as  an  introduction  to  the  procedures  being  adopted  in  a  study 
of  information  services  commissioned  by  the  Office  of  Scientific  and 
Technical  Information,  UK.  Cost  determination  for  alternative  types 
of  service  is  the  first  step  in  the  procedure.  The  replacement  cf  an 
old  Information  service  by  a  new  type  and  the  situations  in  which  two 
alternative  types  of  service  exist  side  by  side  are  evaluated.  Finally, 
the  services  provided  by  alternative  information  systems  must  be 
evaluated. 


f 
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AN  INTRODUCTION  TO  THE  STUDY  OP 
COST  EFFECTIVENESS  IN  INFORMATION  SYSTEMS 

Professor  J.N.  Wolfe 


1.  THE  NEED  FOR  COST  EFFECTIVENESS  STUDIES 

Large  sums  are  spent  each  year  on  inforoation  eervices  in  each  of  the  NATO  countries. 

As  an  example,  the  United  Kingdom  alone  spends  about  50  million  pounds  each  year  on 
library  services  only.  He  have  now  no  reliable  and  consistent  statistics  for  the  amount 
spent  in  other  NATO  countries,  and  in  particular,  we  lack  information  on  the  amount  spent 
on  information  services  other  than  libraries.  The  OECD  is  in  the  process  of  attempting 
to  collect  this  information  and  the  study  is  underway  under  the  direction  of  the 
Studiengruppe  of  Heidelberg,  Germany. 

He  know,  however,  that  the  total  sum  being  spent  is  sufficiently  large  and  growing  with 
sufficient  rapidity  to  present  a  serious  economic  problem.  This  economic  problem  has 
several  aspects.  First,  there  is  the  question  of  how  much  ought  to  be  spent  on  information 
services  in  general.  Secondly,  there  is  the  question  of  how  rapidly  this  sum  should  grow. 
Thirdly,  there  is  the  question  of  the  most  appropriate  di'rision  of  expenditure  among  the 
competing  types  of  information  service  which  might  be  offered,  and  fourthly  there  is  the 
question  of  the  most  appropriate  organisation  of  information  services  both  within  a  single 
country  and  between  countries. 

These  sorts  of  questions  may  have  seemed  to  be  of  only  academic  interest  during  the 
last  decade  or  so,  for  there  has  been  general  agreement  that  th?  volume  of  funds  available 
for  information  services  has  hitherto  been  too  low,  and  funds  have  been  expanded  with 
considerable  rapidity.  During  this  period  too  there  has  been  rapid  technological  change 
in  the  information  Industry.  There  are  now  many  more  technically  developed  candidates  for 
absorption  of  Information  funds  than  was  the  case  even  a  decade  ago.  As  new  techniques 
pass  from  the  laboratory  and  pilot  stage  into  the  world  of  practical  possibility  the 
question  of  economic  viability  and  value  for  money  becomes  a  very  real  and  pressing  one. 


2.  THE  OSTI-OECD  STUDY  OF  ECONOMICS  OF  INFORMATION  SYSTEMS 

It  was  in  this  context  that  the  Office  for  Scientific  and  Technical  Information  in  the 
United  Kingdom,  acting  in  collaboration  with  the  (KCD,  decided  to  undertake  a  study  of  the 
economic  aspects  of  informat  ic  systems.  The  study  was  commissioned  from  the  Depu'tmcnt  of 
Economics  in  the  University  of  Edinburgh,  and  involves  a  large  team  of  workers  including 
five  full-time  economists  and  a  full-time  information  officer,  two  accountants,  two 
statisticians,  and  five  part-t^me  economists.  The  work  has  been  underway  for  approximately 
four  months  but  will  not  be  completed  until  the  end  of  the  calendar  year  1969.  One  aspect 
of  this  work  which  is  already  rather  far  advanced  is  an  economic  study  of  the  library 
system  and  particularly  the  public  library  system  in  the  United  Kingdom.  It  is  proposed 
to  publish  very  shortly  a  volume  of  essays  on  this  topic.  Most  of  the  papers  involved 
are  quantitative  and  econometric  in  character  and  it  would  be  difficult  to  summarise  any 
of  them  briefly.  I  would  however  like  to  mention  here  two  papers  in  paiticular  which  seem 
to  me  to  offer  considerable  Interest.  One  of  these  is  a  paper  by  Mr.  Ralph  Young  on  the 
wvtrecasting  of  the  Demand  for  Library  Services  in  the  Public  Library  Sector  by  econometric 
means.  This  paper  provides  what  I  think  is  the  first  attempt  to  oifcr  a  quantitative 
forecasting  technique  for  library  demand  which  is  not  simply  an  extrapolation  of  past  trends. 


Mr,  Young  shows  that  even  at  this  early  stage  of  analysis  it  is  possible  to  forecast 
the  appropriate  level  of  library  provision  in  a  general  way  at  least  with  considerably 
improved  accuracy.  This  technique  has  been  applied,  as  I  say,  to  the  public  library 
system  but  I  think  that  it  offers  considerable  possibilities  of  extension  to  library 
systems  within  private  firms  or  government  agencies.  Another  paper  of  some  intex'est  is 
that  prepared  by  Dr.  Jacob  Horeh  which  examines  in  a  statls.'lcci  and  econometric  way  the 
problem  of  economies  of  scale  in  library  services.  Or.  Morel'^  attempts,  and  I  think  foi 
the  first  time,  to  go  below  the  level  of  simply  comparing  laige  groups  of  dissimilar 
libraries  with  one  another  on  the  basis  of  an  average  cost  figure.  Such  a  procedure, 
while  common  enough  in  practice,  is  of  course  statistically  exc'-edingly  unreliable. 

Dr.  Moreh  on  the  other  hand  utilises  techniques  made  familiar,  in  production  function 
studies  to  examine  the  cost  functions  of  operating  wltnln  the  public  library  system  on 
the  basis  of  a  variety  of  independent  variables  including  r'^^aber  of  branches  in  each 
library  system,  the  number  of  employees,  the  number  of  vcit.Ees.  and  the  volume  of  ancillary 
services  such  as  gramophone  record  issues.  While  ais  re\  .its  are  not  yet  completely 
analysed,  they  do  seem  to  indicate  that  the  popul':;  belt  in  economies  of  scale  in  the 
library  world  may  be  somewhat  over-simplified. 


3.  THE  NATURE  OF  COST  EFFECTIVENESS  STUDIES 

Before  moving  to  some  account  of  the  larger  economic  study  now  underway,  it  may  be 
useful  to  provide  some  introductory  observations  on  the  nature  of  cost  effectiveness 
studies  in  the  context  of  information  and  library  services.  It  will  be  recalled  that  cost 
effectiveness  .echnlques  were  give-  substantial  development  by  work  undertaken  on  behalf 
of  the  United  States  Department  of  Defense  largely  in  the  Rand  Corporation  of  Santa  Monica, 
California.  Put  in  the  simplert  way,  the  notion  of  a  cost  effectiveness  study  is  an  attempt 
to  discover  the  relative  magnitude  of  costs  and  benefits  accruing  from  alternative  forms 
of  expenditure.  More  concretely,  the  early  studies  involved  assessment  of  the  relative 
cost  per  ton  of  bomb  delivery  for  example.  The  essence  of-  a  cost  effectiveness  study  is 
the  reduction  of  the  benefits  of  alternative  task  systems  to  some  kind  of  commensurable  unit. 
Once  this  is  done  the  problem  becomes  merely  one  of  comparing  the  alternative  task  cutputs 
with  their  costs. 

Looking  at  the  matter  in  another  way,  we  may  see  the  cost  effectiveness  study  as  simply 
an  improvement  on  the  more  normal  cost  study.  Ihe  traJitional  cost  procedure  involves  an 
examination  of  the  costs  of  two  alternati/v  tasks.  But  clearly  costs  are  not  a  sufficient 
determination  of  which  task  provides  the  best  outcome. 

We  must  consider  as  well  the  benefits  achieved  in  each  outcome. 

Let  us  take  an  extremely  simple  example  drawn  from  everyday  life.  Supposing  we  wished 
to  determine  which  was  the  wiser  purchase,  an  orange  or  a  lemon.  We  could  easily  determine 
the  cost  of  the  orange  and  the  cost  of  the  lemon.  The  question  of  which  of  the  two  fruits 
provides  the  better  buy  for  money  depends  however  upon  what  we  wish  the  fruits  for..  If  we 
are  anxious  to  obtain  a  given  quantity  of  Vitamin  C,  for  example,  it  may  well  be  that  the 
lemon  provides  the  better  bargain.  If  our  object  is  to  provide  a  refreshing  morning  drink, 
and  we  wish  therefore  to  maximise  the  sugar  content  of  the  citric  juices,  then  a  different 
answer  may  be  obtained.  We  cannot  therefore  tell  which  fruit  it  would  be  worth  our  while 
to  purchase  until  we  determine  the  objectV'cs  for  which  we  are  purchasing  them. 


4.  ESTABLISHING  THE  COST  OF  INFORMATION  SERVICES 

With  this  introduction  in  mind  there  should  be  little  difficulty  in  understanding  the 
procedure  which  is  being  adopted  with  respect  to  cost  effectiveness  in  information  services. 
The  first  part  of  our  job  is  to  determine  cost  for  alternative  types  of  service.  This 
presents  certain  features  of  difficulty  because  of  the  fact  that  information  services,  like 
most  public  services,  do  not  normally  keep  accounts  upon  what  is  called  a  functional  basis. 
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That  is  to  sajr,  the  accounts  of  most  information  se:  vices  take  the  form  of  a  list  of 

expenditure  by  name  item  of  expenditure,  niat  is  to  say  labour,  materials,  rent,  heat, 

etc.  They  do  not  normally  assign  these  expenditures  to  the  manifold  functions  which  an 
information  service  in  fact  attempts  to  achieve.  It  is  therefore  necessary  to  recast  the 
accounts  of  the  information  services  in  functional  form  before  any  serious  further  work 
can  be  done. 

One  of  the  b&slc  difficulties  here  is  of  course  the  assignment  of  overhead  costs  to  the 
various  alternative  functions.  We  have  to  ask  for  example  what  proportion  of  the  time  of 
a  head  librarian  ought  to  be  attributed  to  bis  work  as  head  of  an  information  service  as 
well  as  of  a  library  service,  in  a  unit  which  offers  both  library  and  information  services. 
Similarly  we  may  ask  what  proportion  of  the  cost  of  beating  an  information  centre  is  to  be 
attributed,  let  us  ssy,  to  the  preparation  of  abstracts  on  the  one  hand  or  to  the  preparation 

of  translations  on  the  other.  It  will  be  clear  that,  however  much  care  is  taken,  there  will 

be  a  certain  measure  of  arbitrariness  in  such  calculations.  It  is  our  object  not  to  eliminate 
arbitrariness  entirely,  but  rather  to  reduce  it  to  manageable  proportions. 

One  important  Isf.ue  is  the  extent  to  which  information  services  may  be  added  to  existing 
library  activities  at  lower  costs  than  information  services  can  be  provided,  in  a  purpose- 
built  organisation.  On  the  one  hand  we  might  expect  that  the  sharing  of  certain  overheads 
with  a  library  would  produce  lower  costs  in  the  integrated  operation.  On  the  other  hand 
the  greater  expertise  which  can  be  developed  in  a  specialised  and  purpose-built  organisation 
may  conceivably  offer  economies  of  substantial  importance.  This  balance  between  economies  of 
scale  and  economies  of  specialisation  is,  as  everywhere  else  in  industry,  an  important 
question  deserving  che  most  careful  examination. 

The  central  core  of  our  method  consists  of  evaluating  two  particular  types  of  situation. 

The  first  is  a  situation  in  which  an  old  type  of  information  service  is  to  be  superseded  by 
a  new  type.  This  situation  provides  alternative  information  on  costs  and  also  provides 
information  on  the  change  in  value  of  the  service  received  by  changing  over  between  the  two 
systems.  An  alternative  approach  consists  of  examining  situations  in  which  two  alternative 
types  of  information  service  exist  side  by  side.  Por  example,  we  may  have  certain  organisa¬ 
tions  which  utilise  an  advanced  information  service  while  other  organisations  utilise  an 
older  style  of  information  service.  Here  costs  and  effectiveness  may  be  compared  on  a 
cross-section  basis.  It  will  be  understood,  however,  that  in  this  case  there  may  be  expected 
to  be  a  substantial  amount  of  extraneous  information  introduced  because  of  the  possibility 
of  urierlying  quality  difference  between  the  units  using  the  technically  advanced  information 
service  and  those  using  ♦'he  technically  less  advanced  information  service.  The  final  part 
of  our  work  consists  in  -valuating  the  services  provided  by  alternative  information  systems. 
This  is  clearly  the  most  difficult  part  of  our  job.  It  is  difficult  partly  because  previous 
attempts  to  deal  with  user  requirements  and  user  needs  have  not  been  directed  specifically 
to  economic  investigations.  There  is  a  fundamental  difference  between  technological  criteria 
of  efficiency  in  this  context  and  criteria  of  economic  efficiency.  Ideally,  one  would  like 
to  obtain  estimates  of  the  Impact  of  the  information  service  on  the  productivity  of  the 
workers  receiving  the  information  service.  In  practice  this  level  of  productivity  is  likely 
to  be  very  much  influenced  by  extraneous  factors.  This  is  a  particularly  damaging  point  if 
we  are  dealing  with  cross-section  studies  of  a  particular  industry  which  has  different 
information  services  in  different  firms.  We  are  likely.  J  think,  to  find  that  good  Infonna- 
tion  services  are  in  fact  characteristic  of  technologically  advanced  firms,  and  if  this  is 
the  case  any  attempt  to  correlate  efficiency  with  information  services  is  likely  to  give  us 
too  optimistic  a  result.  When  we  deal  with  changes  in  information  services  affecting  all 
the  units  in  an  industry,  we  have,  I  think,  a  rather  more  practical  proposition,  althougn 
here  we  will,  I  am  afraid,  be  hampered  for  some  time  yet  by  a  shortage  of  instances.  I 
would  expect,  however,  that  as  the  number  of  information  services  examined  increases,  a 
statistically  reliable  result  may  eventually  be  approximated. 

There  are  alternative  methods  of  obtaining  effectiveness  measurements  from  information 
services.  Some  of  these  consist  of  sampling  opinion  about  efficiency.  Others  consist  of 
obtaining  objective  characteristics  of  the  functioning  of  the  information  service.  But 
this  particular  problem  requires  further  consideration. 
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DISCUSSION 


H.  F.  Vessey:  Have  you  considered  the  cost  of  not  providing  information  when  making  your 
evaluations  of  system  effectiveness? 

J.N. Wolfe:  It  is  not  possible  to  make  allowance  for  a  factor  of  this  sort.  All  evalua¬ 
tion  must  be  based  on  objective  data.  The  first  attempts  at  quantification  of  a  service 
may  not  give  a  satisfactory  result  but  by  repeated  efforts  it  is  possible  to  develop  a 
satisfactory  method  of  measuring  effectiveness. 


N.  E. C.  Isotta:  Tbe  provision  of  information  to  scientists  and  engineers  must  be  considered 
as  part  of  their  continuing  education  and  as  such  its  value  cannot  be  quantified  immediately. 
Tbe  value  of  a  piece  of  information  might  not  emerge  for  several  years.  I  do  not  agree 
that  the  amount  of  information  available  should  be  considered  as  uniform,  one  of  your  nasic 
premises.  It  is  precisely  the  non-uniformity  which  we  have  to  overcome. 

J.N. Wolfe:  I  would  certainly  agree  with  your  first  point,  but  on  a  matter  of  obtaining 
administrative  support  for  expenditure  on  a  system  it  is  necessary  to  show  that  it  will  be 
of  some  practical  value. 
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TECHNICAL  INFOBMATION  SERVICES  AND  USER  NEEDS 
W.C.  Christensen 


1.  INTRODUCTION 

To  begin,  I  would  like  to  define  technical  information  in  a  way  which  I  have  found 
convenient.  The  view  which  I  have  adopted  is  that  technical  information  is  the  generic 
term  embracing  the  full  spectrum  of  information  generated  or  used  by  personnel  working  in 
the  scientific  or  engineering  domain.  Technical  information  can  then  be  divided  into  two 
subcategories  -  scientific  information  and  technical,  or  if  you  prefer,  engineering  da.^a. 
Scientific  information  is  defined  as  technical  information  which  adds  to  the  general  Sody 
of  knowledge  about  a  natural  phenomenon,  material  property,  or  about  a  scientific  or.' 
engineering  discipline.  Scientific  information  does  not  disclose  a  specific  connection 
with  nor  application  to  the  design,  production,  operation,  or  maintenancp  of  an  it.8n  of 
equipment. 

Technical  data,  on  the  other  band,  is  technical  information  obtained  from  the  design, 
development,  manutucture,  operation,  maintenance,  and  logistic  activities  and  is  used  by 
the  recipient  to  design,  produce,  operate  or  maintain  equipment.  For  example,  technical 
data  includes  design  data,  development  data,  production  data,  manufacturing  data,  logistics 
data,  and  maintenance  data.  This  distinction  between  scientific  information  and  technical 
data  is  important  since,  as  will  be  shown  later,  our  major  technical  information  problems 
are  associated  with  technical  data  -  not  scientific  information. 

Now  that  we  have  established  some  boundaries  on  the  subject  we  are  dealing  with,  let’s 
take  a  look  at  the  general  categories  of  audiences  who  use  technical  information. 


2.  WHO  USES  TECHNICAL  INFORMATION? 

As  shown  in  Pig. 1,  there  are  three  major  audiences  for  technical  information  ■  the 
general  audience,  the  mission  audient.  and  the  technical  management  audience. 

Technical  information  used  by  the  general  audience  is  characterized  by  the  fact  that 
the  generator  of  the  information  does  not  know  who  specifically  will  use  the  information 
or  when.  As  an  example,  we  have  over  850,000  U.S.  Department  of  Defense  technical  reports 
centrally  stored  and  available  from  the  Defense  Documentation  Center.  Most  of  these 
reports  were  required  to  document  the  results  of  Defense  research  and  development  efforts. 
However,  the  secondary  use  of  this  information  by  the  general  audience  may  be  for  purposes 
totally  different  from  those  for  which  the  work  was  undertaken  and  at  a  time  considerably 
removed  from  that  during  which  the  information  was  generated.  This  diversity  of  uses  and 
time  differential  creates  serious  problems  in  effectively  retrieving  and  employing  the 
information.  This  retrieval  problem  is  growing  more  difficult  as  the  degree  of  techno¬ 
logical  sophistication  increases.  Our  primary  difficulty  is  that  the  technical  documents 
are  written  in  relation  to  a  specific  end  goal  which  was  the  basic  objective  of  the  work. 
Many  times  this  end  goal  involves  a  highly  complex  piece  of  equipment  such  as  a  missile 
or  a  tank  which  involves  a  multitude  of  discrete  innovations  all  of  which  are  combined  to 
produce  the  end  goal.  The  degree  to  which  each  discrete  innovation  is  documented  is 
highly  dependent  on  the  importance  attached  by  the  generator  in  relation  to  the  end  goal. 
This  creates  two  difficulties.  First  is  the  ability  to  index  each  discrete  piece  of 
technology  so  that  the  report  can  be  retrieved  when  a  user  requests  the  information. 
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The  operator  of  the  information  storage  and  retrieval  system  is  faced  with  the  classic 
dilemma  -  if  he  employs  a  large  number  of  characterizing  or  search  terms  -  searches  will 
produce  a  great  number  of  documents,  many  of  which  are  not  particularly  relevant  to  the 
user’ s  needs.  On  the  other  hand,  too  few  terms  will  result  in  many  relevant  documents 
going  unidentified 

The  second  difficulty  resulting  from  end  goal  oriented  technical  reports  is  that 
frequently  there  is  not  enough  information  related  to  a  specific  technology  for  the 
general  audience  user  to  effectively  take  advantage  of  the  past  work. 

I  will  go  into  more  detail  on  the  trials  and  tribulations  of  information  storage  and 
retrieval  later  but  for  the  moment,  let’s  turn  our  attention  to  the  mission  audience. 

This  audler.e  is  charact 'prized  by  a  close  coupling  to  the  generator  of  the  information. 

The  mission  audience  could  be  the  procurement  organization  for  a  new  piece  of  milibary 
hardware.  In  this  case  the  research  and  development  people  are  well  attuned  to  the  infor¬ 
mation  needs  of  the  procurement  people  with  the  results  that  not  only  is  the  precise 
information  needed  displayed,  but  it  is  also  displayed  in  a  manner  most  meaningful  to  the 
user.  This  close  coupling  between  the  generator  and  user  results  in  efficient  information 
transfer.  However,  we  often  find  that  the  information  tends  to  stay  within  the  relatively 
narrow  confines  of  the  generator-mission  user  environment  even  though  it  could  be  of  con¬ 
siderable  use  to  the  general  audience  or  other  mission  audiences. 

Finally,  we  have  the  technical  management  audience  which  I  have  represented  by  the 
classical  pyramid.  The  increasing  expenditures  for  research  and  development  along  with 
the  additional  complexity  of  the  efforts  themselves  have  increased  emphasis  on  timely  and 
accurate  technical  management  information  systems.  Within  the  U.S.  Department  of  Defense 
we  have  been  aeveloping  a  very  sophisticated  technical  management  information  system 
covering  our  numerous  research  and  technology  efforts.  This  automated  system  is  designed 
to  tell  users  what  work  is  being  done,  by  whom  and  in  very  abbreviated  form  what  the 
progress  is.  While  the  system  was  primarily  designed  to  meet  a  management  need,  we  have 
found  that  over  half  of  the  users  are  working  engineers  and  scientists.  These  people  use 
the  system  to  identify  on-going  research  and  technology  efforts  related  to  their  particular 
areas  of  interest.  While  the  technical  information  content  is  minimal,  it  is  normally 
sufficient  to  determine  whether  the  performer  should  be  contacted  for  detailed  information. 


3.  USER  NEEDS 

Within  these  terms  of  reference,  we  in  the  Department  of  Defense  have  been  very  concerned 
with  what  technical  information  does  the  user  really  need  and  how  well  are  our  various 
technical  information  services  fulfilling  these  needs? 

To  obtain  at  least  a  partial  answer  to  this  complex  question,  we  have  run  two  compre¬ 
hensive  user  needs  studies  -  one  concerned  with  the  needs  of  engineers  and  scientists 
employed  directly  by  the  Department  of  Defense  and  another  covering  those  associated  with 
Department  of  Defense  contractors. 

I  have  summarized  the  results  of  these  two  studies,  performed  by  two  different 
contractors,  in  Fig. 2.  1  want  to  go  through  these  in  some  detail  because  the  information 

is  quite  revealing  in  terms  of  our  present  information  services  and  what  we  should  be 
striving  for  in  the  future. 

Before  discussing  the  various  Information  gathering  characteristics  of  the  users,  a 
few  words  on  the  characteristics  of  the  users  themselves  arc  in  order.  First,  most  of 
the  technical  information  users  are  engineers  or  are  working  in  engineering  related  areas. 
Too  often  this  point  is  overlooked  and  equal  or  greater  attention  is  given  to  the 
scientist  and  his  information  problems.  While  I  do  not  want  to  belittle  the  information 
problems  of  the  scientists.  It  is  engineers  and  other  appliers  of  technology  which  are  my 
chief  concern  and  unfortunately,  their  information  problems  are  exceedingly  complex. 


127 


Now  let’s  look  at  how  our  users  obtain  Information  and  the  type  of  information  they 
need.  The  first  statistic  pertains  to  the  desire  for  information  in  a  short  period  of 
time,  frhile  our  work  showed  that  over  20%  of  the  users  needed  information  in  less  than 
one  day,  most  users  would  really  like  to  have  their  information  needs  met  instantaneously. 
What  frequently  happens  is  that  the  user  makes  a  quick  minimum  effort  at  getting  infor¬ 
mation.  If  the  optimum  information  is  not  found  during  this  first  try,  he  will  too  often  resort 
to  the  use  of  readily  available  but  less  than  optimum  Information.  FVjr  example,  an  engineer 
selecting  materials  may  not  use  a  low  cost  material  because  he  cannot  readily  determine  its 
characteristics  in  a  particular  environment.  Instead,  he  picks  an  expensive  alloy  which  he  knows 
will  do  the  job.  This  gives  rise  to  one  of  the  frequently  used  arguments  against  expending  re¬ 
sources  to  provide  better  technical  information  systems  -  the  uoers  seem  to  do  their  job 
without  them!  However,  the  real  question  is,  "How  much  could  their  performance  be  Improved 
by  instituting  better  technical  Information  systems?  " 

A 

The  next  item  pertains  to  how  the  user  gets  his  information.  Our  studies  show  that  he 
turns  to  a  colleague  or  his  personal  files  as  a  first  source  of  information  which  supports 
my  argument  that  users  operate  on  a  minimum  effort  principal  as  far  as  requisition  of 
technical  information  is  concerned. 

The  next  item  is  very  Important  from  a  user  need  point  of  view.  As  I  mentioned 
previously,  our  main  concern  should  be  with  the  engineer  or  technologist  and  this  statistic 
clearly  bears  out  the  need  for  so  called  engineering  type  information.  Yet  this  infor¬ 
mation  consisting  of  design  Information,  test  data,  operational  data,  manufacturer’s  part 
and  component  information  and  the  like,  is  the  most  difficult  to  handle  in  a  technical 
information  systein.  One  aspect  of  the  problem  is  that  engineering  information  is  difficult 
to  capture  so  that  it  can  be  Incorporated  in  an  information  system.  The  difficulty  stems 
from  both  the  amount  of  Information  being  generated  and  the  fact  that  most  of  it  is  being 
created  for  the  mission  audience  which  is  not  particularly  motivated  to  disseminate  it  to 
the  general  audience.  However,  the  more  serious  problem  with  engineering  information  is 
that  it  tends  to  have  a  short  half  life.  In  other  words  what  may  be  valid  up  to  date 
engineering  information  today  may  be  obsolete  tomorrow.  We  have  run  some  experiments 
with  user  oriented  information  systems  where  ue  have  incorporated  both  engineering  inforr 
mation  which  users  knew  was  up  to  date  and  seme  engineering  information  which  the  users 
were  not  quite  sure  of.  The  results  were  that  the  engineering  Information  which  the 
users  were  not  sure  was  up  to  date  wasn’t  used  at  all  -  even  though  it  was  probably  better 
Information  than  they  could  obtain  from  other  sources. 

The  cost  of  maintaining  quality  control  over  short  half  life  engineering  information 
is  very  high.  For  instance,  I  estimate  that  the  U.S.  Department  of  Defense  expends  about 
$80M  a  year  just  to  operate  its  military  specifications  and  standards  program.  When 
one  begins  to  consider  expanding  this  type  of  quality  control  to  other  sources  of 
engineering  information,  serious  questions  of  the  cost  versus  benefits  must  be  raised. 

Along  this  line  I  would  like  to  mention  one  of  my  "pet”  concerns  about  the  information 
utilization  habits  of  engineers  -  recalling  that  the  engineer  normally  obtains  his 
technical  information  from  his  local  environment,  that  is,  his  pe/sonal  files  and  col¬ 
leagues,  take  a  look  at  his  private  library  sometime.  My  experience  has  been  that  his 
favourite  tools  are  often  text  and  reference  books  c  talned  in  college  plus  a  few  odds 
and  ends  he  has  encountered  and  used  in  depth  during  his  career.  His  college  books  contain 
information  generated  at  least  five  years  before  the  book  was  published.  Adding  this  five 
years  to  the  time  since  his  graduation  means  that  the  information  is  on  the  average  20  to 
30  years  old.  The  various  odds  and  ends  that  he  has  picked  ui  over  the  years  are  similarly 
in  various  stages  of  obsolescence.  To  me  it  is  a  wonder  that  he  can  survive  in  this  age 
of  exploding  technology  and  multi-disciplinary  efforts. 


4.  PROVISION  OF  TECHNICAL  INFORMATION  BY  DEPARTMENT  OF  DEFENSE 

Now  that  we  have  addressed  the  user  needs,  albeit  in  abbreviated  fashion,  let  us  take  a 
look  at  the  existing  U.S.  Department  of  Defense’s  situation  from  a  technical  Information 
system  viewpoint. 
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First,  we  have  a  large  central  depository  for  technical  reports  resulting  from  Defense 
research  and  development  known  as  the  Defense  Documentation  Center.  The  Center  accessions 
about  50,000  new  technical  reports  every  year.  The  reports  are  Indexed  upon  receipt  and 
the  bibliographic  information  added  to  a  computer  based  search  system,  and  at  the  same 
time,  announced  to  the  defence  user  community.  Subsequently,  the  reports  can  be  ordered 
or  a  bibliography  can  be  prepared  on  any  given  subject.  At  the  present  time,  the  Center 
is  receiving  about  2  million  request  for  technical  reports  and  20,000  requests  for 
bibliographies  each  year.  Granted  the  use  factors  are  impressive,  but  some  consideration 
must  be  given  to  the  Center’s  operation  in  terms  of  the  user  needs.  First,  considering 
the  input  side  of  the  Center,  we  have  a  major  problem  which  I  mentioned  in  the  beginning. 
The  technical  reports  handled  by  the  Center  are  prepared  in  relation  to  a  specific  end 
goal.  While  these  reports  may  have  a  high  degree  of  relevancy  to  those  intimately  con¬ 
cerned  with  that  particular  end  goal,  their  effectiveness  as  information  transfer  media 
to  users  uot  familiar  with  the  .area  of  endeavour,  may  be  low.  We  also  have  the  problem 
that  too  often  the  actual  technical  Information  content  of  these  reports  is  low  and  as 
the  saying  goes  “garbage  in  -  garbage  out".  The  real  problems  become  visible  when  attempts 
are  made  to  characterize  the  contents  of  these  reports.  It  would  be  fine  if  the  users 
only  needed  to  retrieve  information  cn  an  end  goal  basis  such  as  development  of  solid 
propellant  missiles.  However,  more  and  more  we  find  that  users  are  searching  for  discrete 
pieces  of  technology  associated  with  a  particular  problem  at  hand  such  as  pressure  seal¬ 
ing  of  gauges.  Now  this  information  might  be  reported  in  a  missile  development  report  if 
it  was  particularly  pertinent  to  the  overall  missile  development  programme.  The  problem 
is  that  indexing  the  repert  so  that  each  discrete  piece  of  technology  is  reported,  results 
in  a  large  data  bank  which  is  difficult  to  effectively  search  and  more  importantly, 
results  in  an  unacceptable  large  number  of  irrelevant  report  identifications  in  response 
to  a  user’s  query.  Nothing  can  discourage  a  user  more  than  loading  him  up  with  a  vast 
amount  of  information  which  he  is  not  interested  in.  There  is  one  further  problem 
associated  with  the  operation  of  a  central  report  depository  such  as  the  Defense  Documen¬ 
tation  Center.  This  is  the  time  delay  associatetl  with  obtaining  the  information.  Re¬ 
gardless  ex'  how  efficient  the  Center’s  operation  is,  there  is  about  a  2  week  delay 
primarily  as  a  result  of  physical  transfer  of  the  request  and  resultant  product.  The 
importance  of  this  delay  can  be  seen  when  the  users  desire  for  rapid  access  to  information 
is  considered.  There  are  two  ways  to  get  around  this  situation  -  utilization  of  advanced 
communication  techniques  or  to  provide  the  information  in  advance  to  an  information  centre 
in  the  users  immediate  environment.  Several  other  speakers  are  covering  advanced  communi¬ 
cation  techniques  so  I  will  not  dwell  on  it  here,  except  to  mention  that  we  are  installing 
several  experimental  remote  on-line  terminals  to  the  Defense  Documentation  Center. 

The  providing  of  technical  information  to  the  user  locally  has  been  the  traditional 
role  of  the  technical  library.  The  difficulties  are  many  fold.  To  begin  with,  they  deal 
in  documents  -  not  information.  The  user  must  research  the  documents  and  extract  that 
information  which  is  pertinent  to  his  needs.  Also,  the  technical  libraries  find  it  in¬ 
creasingly  difficult  to  maintain  collections  covering  the  full  range  '  f  the  interests  of 
the  users  they  service.  Finally,  there  is  a  communication  problem  between  the  technical 
user  and  the  non-technical  librarian.  I  feel  that  this  latter  point  is  particularly 
significant  and  that  if  our  so  called  retain  stores  are  to  become  a  viable  part  of  our 
technical  information  systems  of  the  future,  they  must  employ  technically  competent  per¬ 
sonnel  in  addition  to  those  solely  concerned  with  storage  and  retrieval  cf  documents. 

These  technically  competent  personnel  which  we  might  call  technical  information  specialists 
not  only  provide  an  effective  coupling  between  the  user  and  the  information  source,  but 
can  also  answer  users’  queries  with  highly  relevant  information  -  not  just  documents. 

One  area  where  the  U.s.  Department  of  Defense  has  created  technical  Information  systems 
manned  by  technically  competent  personnel  is  the  Information  analysis  centres.  From  a 
technical  information  transfer  point  of  view,  these  centres  are  very  effective.  Each  of 
our  26  centr».s  is  assigned  a  very  specific  subject  or  discipline  area.  Generally,  the 
personnel  operating  the  centres  actually  spend  a  portion  of  their  time  working  in  the  sub¬ 
ject  or  discipline  area,  providing  specific  answers  to  users’  inquiries  in  their  area  of 
expertise  and  publishing  high  technical  content  documents.  Thus,  these  centres  get  around 
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the  input  problems  associated  with  the  operation  of  central  document  systems  like  the 
Defense  Documentation  Center  and  provide  the  personalized  coupling  between  the  user  and 
the  information.  Because  of  their  competence  in  their  field  of  expertise,  they  also 
provide  the  quality  assurance  factor  which  I  discussed  in  relation  to  engineering  infor¬ 
mation.  The  major  drawback  to  these  centres  is  that  they  are  very  expensive  and  to  date, 
we  have  only  been  able  to  justify  them  for  a  limited  number  of  subjects  or  disciplines. 

This  brings  me  to  a  key  point  which  we  must  face  in  the  technical  information  business. 
The  cost  of  various  technical  Information  systems  and  services  can  be  identified. 

However,  the  benefits  in  quantifiable  terms  are  very  difficult  to  ascertain.  Intuitive 
arguments  that  technical  information  systems  and  services  are  good  have  Just  about  ex¬ 
hausted  their  appeal.  Within  tbe  U.S.  Department  of  Defense  we  are  initiating  a  program 
of  charges  for  selected  technical  information  services  on  the  basis  that  if  the  service 
is  of  value  to  the  user,  he  should  be  willing  to  pay  for  it. 

In  summary,  I  see  three  pressing  needs  for  technical  Information  systems  of  the  future. 
First,  we  must  improve  the  quality  of  the  technical  information  in  our  systems  and  I  would 
suggest  that  the  best  place  to  do  this  is  at  the  source.  Next,  we  must  get  more  users 
Involved  in  the  design  and  operation  of  technical  information  systems.  Too  often,  systems 
are  created  to  serve  phantom  audiences.  Finally,  we  must  find  ways  to  quantify  the 
benefits  derived  from  more  effective  technical  information  systems  so  that  decisions  to 
establish  these  systems  can  be  based  on  fact  and  not  fantasy. 


The  discussion  on  this  paper  follows  on  page  131. 
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OISCUSSION 

R. W. G.Gendy;  What  steps  would  you  propose  to  improve  „he  quality  of  Information  at  the 
source,  i.e.,  the  standard  of  report  writing  and  presentation,  and  in  particular  the 
elimination  of  “garbage”? 

W.C.  Christensen:  The  first  htep  is  to  improve  the  education  of  engineers  and  scientists 
with  regard  to  report  writing.  Secondly,  the  majority  oi  reports  are  produced  under  con¬ 
tract  because  many  contractors  feel  that  the  value  of  their  work  is  judged  by  the  number 
of  reports  produced.  Contract  monitoring  agencies  should  encourage  the  state  of  affairs 
where  a  repo^^  is  only  generated  to  give  some  really  useful  information  on  the  work  being 
undertaken. 


E. Keonjian:  (i)  What  is  done  to  reduce  the  amount  of  useless  inforiuation  entered  into 
the  DDC  system’  (2)  Couid  yon  define  the  steps  in  answering  an  enquiry’ 

W.  C. Christensen:  (1)  Technical  monitors  of  a  contract  are  asked  to  suppress  progress 
reports  produced  solely  on  a  time  basis  e.g. ,  every  three  months.  (2)  Taking  as  an 
example  an  engineer  wanting  a  bibliography  on  a  special  subject,  the  steps  are:- 

(a)  Question  is  analyzed  and  descriptors  allocated  from  DDC  Thesaurus, 

(b)  Computer  staff  put  request  to  the  UNlVAC  1107  system  and  identify  the  relevant 
reports, 

(c)  Staff  with  some  knowledge  of  the  requester’ s  speciality  examine  the  p.*int-out  and 
edit  it,  perhaps  to  reduce  the  number  of  references.  If  the  number  of  references 
is  very  great,  the  requester  will  be  asked  for  further  definition  of  the  subject. 

(d)  When  the  subject  specialist  is  satisfied,  the  list  of  references  is  sent  to  the 
requester. 


R.  D.  Kerr-Waller:  (1)  The  number  of  data  banks  needed  to  cone  with  the  volume  of  literature 
handled  by  DDC  must  be  considerable.  How  does  an  on-line  tcxminal  syster  operate  when  the 
question  can  fall  into  any  one  of  several  data  banks’  (2)  What  charges  does  DDC  propose 
to  make  for  its  services? 

W.  C.  Christensen:  (1)  About  six  data  banks  are  used,  but  only  one  is  the  report  data  bank 
which  contains  bibliographic  details  of  about  400,000  references.  A  change  soon  to  a 
UNIVAC  1108  system  will  make  searching  more  rapid.  (2)  DDC  have  established  a  charge  of 
3  dollars  for  each  hard  copy  report  supplied;  microfiche  are  supplied  free.  The  twenty- 
six  information  analysis  centers  operate  in  various  ways  tailored  to  the  needs  of  their 
circle  of  users.  The  total  budget  of  each  center  has  besn  reduced  making  it  necessary 
for  them  to  introduce  charges,  but  each  center  decides  f  r  itself  exactly  how  these  sh.  tl 
be  levied. 


F.  Hangsted:  The  "General  Audience”  often  includes  the  decision-makers.  Their  task  is 
often  made  more  difficult  by  the  amount  of  jargon  and  special  terminology  used  in  the 
literatuie.  Is  there  a  solution  to  this  problem? 

W.  C. Christensen:  The  basic  solution  is  to  get  colleges  to  give  better  training  in  written 
expression,  fii  particular  cases,  persuasion  or  some  direct  action  can  often  help. 


A. H. Holloway:  Is  there  a  solution  to  the  problem  that  what  may  be  essential  information 
to  some  users  may  be  "garbage”  to  others? 
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W.  C  Christensen:  By  “garbage’*  I  mean  lengthy  passages  of  text  containing  very  few  facts. 
Elimination  of  this  style  of  writing  would  be  of  advantage  to  everyone.  I  appreciate 
that  when  writing  a  report  it  may  be  difficult  to  Judge  exactly  what  will  be  of  interest 
to  a  particular  audience  but  we  must  try  to  increase  the  proportion  of  technical  content 
of  reports.  We  must  also  make  better  use  of  reports  by  finding  ways  of  Identifying 
discrete  pieces  of  information  which  may  be  of  use  outside  the  main  field  of  the  report. 
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SUMMARY 


Selective  Dissemination  of  Information  (SDI)  provides  individual 
scientists  and  engineers  with  announcements  of  a  limited  number  of 
documents  specifically  of  interest  to  them,  in  contrast  to  the  general 
coverage  provided  by  increasingly  bulky  abstract  journals.  Selection  is 
done  by  a  computer  program,  which  compares  a  file  of  bibliographic  data 
on  current  reports  and  journal  literature  with  an  SDI  user’s  interest 
profile,  then  prints  out  references  to  matching  documents.  The  selected 
references  may  be  presented  to  the  user  on  cards  suitable  for  filing  or 
on  less  expensive  printed  lists,  and  mey  provide  only  the  document 
citation  or  the  full  abstract.  Feedback  by  the  user  on  the  relevance  of 
the  documents  helps  to  optimize  his  Interest  profile  for  best  selection. 
Comparison  of  numerous  individual  interest  profiles  is  expensive  in 
computer  time,  and  profile  Improvement  requires  assistance  by  vocabulary 
specialists.  Economical  service  to  large  numbers  of  participants  may  be 
provided  by  the  use  of  standard  subject  profiles,  as  typified  by  the 
NASA/SCAN  (Selected  Current  Aerospace  Notices)  program  which  is  described. 
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SELECTIVE  DISSEMINATION  OF  INFORMATION 
M. S. Day 


1.  INTRODUCTION 

As  I  am  addressing  working  scientists  and  engineers,  there  seems  no  need  to  belabour 
the  trite  expression,  the  “information  explosion".  I  am  sure  that  you  have  already 
encountered  the  problem  in  the  shape  of  great  numbers  of  reports  and  articles  to  read  and 
digest  and  in  the  growing  bulk  of  abstract  Journals  that  you  must  use  to  keep  currently 
aware  of  developments  in  even  a  limited  area  of  your  interests.  While  we  in  the  profession 
of  information  science  and  technology  cannot  say  that  we  have  kept  ahead  of  the  problem  by 
offering  fully  satisfactory  solutions,  I  feel  that  advances  are  being  made.  One  approach  to 
the  problem  of  reducing  your  literature  review  efforts  is  Selective  Dissemination  of  Informa 
tion,  or  SDI  for  short.  SDI  applies  the  advances  in  computer  sciences  already  discussed  at 
this  symposium  to  the  task  of  providing  a  personalized  current  awareness  service. 

Current  awareness  services  are  not  new.  Most  libraries  have  long  provided  patrons  with 
copies  of  current  documents  that  the  librarian  has  decided  will  be  of  interest  to  particular 
individuals.  Library  accession  lists  are  frequently  categorized  to  call  newly  received 
documents  to  the  awareness  of  groups  of  potential  users.  Current  issues  of  abstract 
journals,  when  routed  to  pre-ostablished  distribution  lists,  also  are  current  awareness 
tools. 

But  the  usefulness  of  such  methods  is  limited  by  inconsistent  selection  or  by  excessive 
volume  of  material  announced.  In  the  case  of  abstract  journals,  even  a  categorization 
scheme  does  not  overcome  the  problem  of  scanning  a  great  bulk  of  abstracts.  Nor  does  such 
journal  categorization,  with  announcement  of  a  document  only  in  a  single  category,  provide 
for  complex  Interests,  which  often  cut  across  many  fields. 

Some  scientists  and  engineers  may  claim  that  they  have  no  need  for  a  selective  current 
awareness  service.  They  may  be  those  active  leaders  in  their  specialty  who  belong  to  the 
so-called  “invisible  colleges".  Besides  attending  all  pertinent  conferences,  they  exchange 
and  file  preprints  and  reprints.  For  the  great  majority  of  scientists  and  engineers,  how¬ 
ever,  a  more  formal  and  efficient  service  is  necessary  to  alert  them  to  current  documents 
of  specific  significance.  Even  members  of  invisible  colleges  find  that  a  current  awareness 
service  alerts  them  to  timely  reports  and  journal  articles  that  might  otherwise  be  delayed 
in  reaching  their  attention. 

In  describing  what  SDI  can  do  for  you,  it  is  also  essential  to  refer  to  the  mechanism  of 
the  SDI  operations  and  especially  to  the  relative  costs  and  efforts  Involved  in  the  many 
different  SDI  systems  that  can  be  designed.  As  users,  you  will  be  concerned  with  obtaining 
the  best  design  and  operation  possible.  Obviously,  if  the  managers  of  your  firm  or  pro¬ 
fessional  society  feel  that  a  proposed  SDI  system  is  excessively  costly  in  relation  to  the 
organization’ s  many  other  goals,  it  will  not  be  established.  As  potential  users,  you  should 
be  prepared  to  participate  actively  in  the  design  of  Information  systems  and  be  able  to 
demonstrate  the  cost  effectiveness  cf  rhe  service  you  will  receive. 


2.  PRINCIPLES  OF  SDI 

SDI  is  a  current  awareness  tool  and  results  in  the  selection  and  announcement  of  current 
documents  having  a  high  probability  of  interest  to  the  individual  user.  The  fundamental 
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element  in  selection  is  the  comparison,  by  computer,  of  two  data  files  (Fig. 1).  One  is  the 
file  of  bibliographic  data  assigned  to  newly  received  reports.  These  data  include  subject 
index  terms  and  other  document  representations  --  the  authors,  corporate  sources,  supporting 
agency,  contract  or  grant  number,  etc.  The  other  file  contains  the  users’  interest  pro¬ 
files,  which  are  equivalent  to  bibliographic  search  strategies  (Fig. 2).  The  interest 
profiles  consist  of  bibliographic  data  elements,  such  as  subject  index  terms,  related  by 
the  common  Boolean  logic  expressions,  such  as  AND,  UR,  or  NOT.  Other  methods  of  relating 
terms  are  possible;  e.g. ,  by  assigning  relative  weights  to  each  term,  a  certain  minimum 
total  weight  of  matching  index  terms  then  being  required  before  a  document  is  chosen  for 
announcement.  Authors,  contracts,  and  other  document  identifiers  may  also  be  included  in 
the  search  strategy.  The  particular  features  of  the  profile  structure  and  the  flexibility 
in  the  document  identifiers  that  the  profile  can  incorporate  depend  on  the  computer  capabili¬ 
ties  available. 

The  interest  profile  is  of  first  Importance  in  the  success  of  an  SDI  system.  Your  profile 
is  not  just  a  paragraph  describing  your  interests,  it  is  a  rational  set  of  specific  terms 
in  the  same  technical  language  used  by  the  document  indexers.  Structuring  an  Interest  pro¬ 
file  may  require  considerable  skill.  If  you  were  interested,  for  example,  in  the  subject  of 
supersonic  transports,  it  would  not  be  sufficient  to  put  just  the  term  “Supersonic  Trans¬ 
port”  in  your  profile.  Documents  specifically  on  the  Concorde  might  be  indexed  to  the  term 
Concorde  Aircraft  and  not  to  the  general  term  Supersonic  Transport,  What  other  aspects  of 
supersonic  flight  are  you  Interested  in  --  clear  air  turbulence,  sonic  boom,  general  con¬ 
cerns  of  international  law  affecting  civil  aviation,  or  basic  engineering  problems  involving 
supersonic  heat  transfer,  supersonic  flutter,  or  supersonic  wind  tunnels?  Are  you  interested 
in  getting  every  report  on  a  given  contract?  Do  you  want  to  limit  the  number  of  announce¬ 
ments  that  you  receive,  remembering  that  the  total  number  of  documents  indexed  by  certain 
common  terms  might  be  quite  large?  How  is  this  limitation  on  number  of  announcements  to  be 
done  on  your  interest  profile  --  by  removing  index  terms,  or  by  restricting  selection  through 
Boolean  logic  relationships? 

As  an  SDI  user,  you  would  have  to  take  an  active  part  in  structuring  your  profile,  or 
else  have  it  written  for  you.  Because  of  the  ccHr<plexity  of  a  satisfactory  Interest  profile, 
experience  with  SDI  systems  has  shown  that  the  scientist  or  engineer  requires  considerable 
help  in  constructing  his  profile.  Such  help  requires  the  services  of  a  professional 
reference  analyst,  who  has  the  authorized  authority  terms  and  indexing  patterns  and  prL^tices 
at  his  fingertips.  I  again  wish  to  point  out  that  the  success  of  an  9)1  program  is  directly 
related  to  the  quality  of  the  user  profiles. 


3.  ELEMENTS  OF  SDI 

Besides  the  interest  profile,  features  essential  to  any  SDI  program  are: 

(a)  A  standard  form  for  presenting  selected  announcements  to  the  user.  This  may  be  in  a 
form  that  the  user  can  conveniently  retain. 

(b)  A  method  for  conveniently  requesting  a  copy  of  an  announced  document  from  a  local 
library  or  from  t-he  central  operator  of  the  SDI  service. 

(c)  Routine  feedback  by  the  user  to  the  system  as  to  his  degree  of  satisfaction  with  each 
document.  The  feedback  should  provide  a  quantitative  measure  of  t.he  performance  of 
the  user' s  Interest  profile  and  of  the  operation  of  the  over-all  system. 

Many  organizations,  both  in  the  United  States  and  Europe,  have  initiated  SDI  programs  to 
date.  Their  experience,  as  reported  in  the  literature,  can  be  drawn  on  in  designing  new 
current  awareness  systems.  I  am  most  cognizant  of  the  programs  of  the  National  Aeronautics 
and  Space  Administration.  NASA  has  been  a  leader  in  the  SDI  field,  having  operated  several 
types  of  program  since  late  1963.  Its  SDI  services  have  been  distinguished  by  volume  of 
Input  and  by  size  of  user  population.  During  1967,  for  example,  875  interest  profiles  were 
matched  four  times  each  month  against  the  data  files  corresponding  to  the  full  contents  of 
the  current  Issues  of  Scientific  and  Technical  Aerospace  Abstracts  and  International 


Aerospace  Abstracts,  Aerospace  reports,  journal  articles,  conference  papers,  etc.,  matched 
during  the  year  totaled  63,700;  and  a  total  of  almost  800,000  announcements  were  distributed. 
NASA  is  now  moving  into  new  evolutionary  phases  of  current  awareness  service,  as  I  will 
discuss. 


4.  TYPES  OF  ANNOUNCEMENT  FORMS 

Numerous  forms  have  been  designed  by  SDI  system  operators  for  announcing  selected  docu¬ 
ments  to  users.  A  distinction  can  be  made  between  the  card  type  of  announcement,  with  each 
announcement  issued  as  a  unit  record,  and  the  listing  type,  with  the  announcements  printed 
continuously  on  sheets. 

4. 1  Card  format 

The  majority  of  SDI  services  provide  the  user  with  a  card  for  each  selected  announcement. 
Because  SDI  is  generally  thought  of  in  the  framework  of  a  computerized  information  system, 
this  card  is  typically  an  electronic  data  processing  (EDP)  or  tab  card.  Systems  providing 
edge-notched  cards  or  other  announcement  form  designs  are  feasible  for  information  services 
of  limited  scope.  The  card  may  present  a  full  abstract  or  merely  a  bibliographic  reference. 
If  only  a  document  citation  is  presented,  its  limited  informativeness  may  be  enriched  by 
also  printing  out  the  index  terms  assigned  to  the  document. 

The  material  presented  on  the  SDI . announcement  card  may  be  computer-printed,  or  it  may  be 
dUDlicated  by,  for  example,  offset  printing.  Offset  reproduction  permits  full  abstracts, 
even  wi-h  special  symbols  and  illustrations,  to  be  reduced  in  size  and  presented  on  a  single 
card,  whereas  a  computer  printout  is  strictly  limited  in  the  number  of  lines  of  information 
that  can  be  presented.  A  disadvantage  of  offset  reproduction  is  the  need  for  two  operational 
procedures.  The  computer  first  punches  a  card  with  the  user’ s  name  and  address  and  an 
identifying  number  for  the  selected  document.  The  punches  are  then  interpreted  into  printed 
characters  on  the  face  of  the  card.  The  abstract  must  then  be  reproduced  by  offset  onto 
the  corresponding  punched  card.  Much  handling  and  sorting  of  the  cards  is  involved  in  such 
a  dual  system. 

The  notification  cards  received  by  SDI  users  are  usually  designed  so  that  a  stub  may  be 
detached  and  returned  to  the  library  for  requesting  a  copy  of  the  document,  or  for  merely 
indicating  that  the  announcement  was  or  was  not  of  interest. 

Both  the  user's  address  and  the  abstract  need  not  be  presented  on  a  single  card,  although 
this  is  the  common  practice.  NASA’s  first  SDI  program,  operational  from  December  1963  to 
January  1966,  provided  the  user  with  two  cards  for  each  announcement  (Pig. 3).  Ckie  was  an 
EDP  card  which  was  punched  and  interpreted  with  the  user’ s  name  and  address  and  the  document 
number.  These  cc’'ds  contained  small  prescored  blocks  which  the  user  could  punch  out  to 
express  his  evaluation  of  the  announcement;  i.e.,  that  the  announced  document  was  (1)  of 
interest  and  that  a  copy  was  wanted,  (2)  was  of  Interest  but  that  no  copy  was  wanted  at  the 
moment,  or  (3)  was  not  of  interest.  The  second  card  was  not  computer  manipulated,  although 
it  was  cut  to  the  same  size  and  shape  as  the  typical  computer  punched  card.  It  presented  the 
full  offset-printed  abstract  of  the  selected  document.  The  two  cards  for  an  announcement 
were  inserted  into  a  single  window  envelope,  with  the  user’s  name  and  address  visible.  As  the 
envelopes  were  necessarily  in  order  '.y  the  abstract  number,  they  were  then  manually  sorted 
according  to  the  user’s  organization  for  batched  mailing  and  subsequently  by  the  organiza¬ 
tion’s  mall  room  for  Internal  distribution  (Plg.4). 

Cards  are  very  popular  with  the  SDI  user,  as  ht  may  file  those  of  particular  interest  in 
a  personal  desk-drawer  file.  Undoubtedly,  this  is  a  valuable  tool  for  many  scientists  and 
engineers.  Howeve’’,  maintaining  an  individual  file,  either  of  cards  or  documents,  can  be 
expensive  in  terms  of  the  individual’s  time,  and  possibly  in  storage  space,  SDI  is  primarily 
a  current  awareness  service,  and  provision  for  a  continuing  blllographic  data  file  is  of 
subordinate  value. 
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The  cost  of  a  particular  SDI  service  depends  on  so  many  factors  of  input  volume, 
materials  used,  computer  processing,  degree  of  profile  assistance,  geographical  distribu¬ 
tion  of  users,  etc, ,  that  only  a  rough  figure  can  be  suggested  for  the  cost  of  an  operating 
system.  Detailed  cost  analysis  should  precede  the  implementation  of  any  SDI  proposal.  A 
card-type  SDI  system  might  fall  in  the  range  of  $100  to  $150  per  user  per  annum  for  a  large 
volume  of  input  references:  e.g. ,  the  total  references  in  Scientific  and  Technical  Aerospace 
Reports  and  International  Aerospace  Abstracts. 

4.2  Listing  format 

Less  expensive  then  card-type  announcements  are  computer-printed  listings  of  selected 
bibliographic  references.  In  general,  listings  present  only  bibliographic  references,  per¬ 
haps  with  the  index  terms  to  enhance  the  document  content  information  provided  by  the  title, 
author,  and  other  reference  elements.  If  the  abstracts  are  on  machine-readable  files,  the 
abstract  may  be  printed  out  in  full  or  in  part.  While  this  may  be  helpful  to  the  user’ s 
understanding  of  the  content  of  the  dccument,  it  incurs  the  expense  of  added  computer  use, 
increased  bulkiness  of  the  announcement  package,  and  added  review  time  for  the  user. 

NASA’ s  present  SDI  system,  in  effect  since  February  1966,  is  an  adaptation  of  the  simple 
listing.  A  three-copy,  no-carbon-requlred  form  is  used  (Pig. 5).  The  computer-printed 
bibliographic  references  of  course  appear  on  all  three  sheets,  together  with  the  user’s 
name  and  address.  The  computer  also  prints  blocks  (lozenges)  opposite  each  announcement. 

The  empty  blocks  are  for  the  recipient’s  use  in  checking  the  relevance  of  the  announcement 
to  him;  whether  it  is  of  interest  and  the  document  is  requested,  of  interest  but  the  docu¬ 
ment  not  wanted,  or  of  no  interest.  When  the  user  receives  his  announcements,  he  marks  his 
evaluation  opposite  each  announcement,  simultaneously  marking  all  copies,  then  tears  off  the 
original  for  retention  if  he  desires.  The  other  copies  are  forwarded  to  his  library,  where 
one  of  the  copies  is  used  to  fill  document  requests  while  the  other  is  returned  to  the 
system  operator.  The  operator  tabulates  all  responses  and  computes  the  ratio  of  number  of 
relevant  announcements,  as  indicated  by  the  user,  to  the  total  number  of  announcements  for 
each  user  and  for  the  over-all  system.  The  tabulated  results  serve  as  measures  of  opera¬ 
tional  effectiveness.  Again,  costs  of  a  list  type  system  depend  on  the  information  pre¬ 
sented  and  the  other  factors  common  to  all  SDI  systems,  but  might  fall  in  the  range  of  60 
to  70  per  cent  of  the  cost  of  a  card-type  system. 

4. 3  Nixed  announcement  forms 

Listings  are  adequate  as  announcement  tools,  but  they  lack  an  important  element  of  a  fully 
automated  system;  namely,  machine  readability  of  document  requests  and  response  evaluations.. 
In  card  systems,  this  is  provided  by  a  stub,  which  is  detached  and  sent  to  the  user’ s 
library.  Holes  punched  in  the  stub  can  be  read  by  computer,  which  can  then  prepare  document 
order  forms  and  tabulate  the  user  response  data. 

The  advantages  of  both  listings  and  cards  can  be  combined.  A  computer- printed  listing  of 
selected  references  can  be  accompanied  by  a  stack  of  electronic  data  processing  cards,  which 
have  been  prepunched  and  interpreted  with  the  document  and  user’ s  identification.  The  user 
selects  the  cards  that  correspond  to  the  announcements  he  has  Just  read,  punches  out  the 
appropriate  prescored  holes  to  express  his  interest  evaluation  and  to  request  copies  of 
desired  documents.  Returned  to  his  library,  the  cards  can  serve  as  links  in  a  fully 
automated  system. 


5.  USER  FEEDBACK 

Optimum  SDI  service  depends  primarily  on  the  user’s  interest  profile  and  its  improvement 
through  feedback.  It  is  important  to  understand  the  meaning  of  optimum  service.  Clearly, 
you  as  a  user  would  best  be  served  if,  of  the  announcements  you  receive,  all  refer  to 
documents  that  are  definitely  of  Interest  to  you.  The  announcements  you  receive  should 
Include  every  one  in  the  file  that  would  be  of  Interest  to  you  if  you  had  a  chance  to 
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review  it.  Unfortunately,  these  are  mutually  incompatible  goals.  If  you  attempt  to  express 
your  interests  by  structuring  your  interest  profile  in  rather  broad  terms,  some  documents 
of  no  interest  will  be  announced  to  you  because  of  the  various  meanings  that  the  indexer 
might  have  attributed  to  these  terms  while  indexing  the  documents.  If  you  attempt  to  be 
very  precise  in  your  choice  of  profile  terms  and  further  restrict  their  selective  power  by 
requiring  Boolean  intersections  between  terms,  then  you  will  miss  being  informed  of  some 
documents  that  you  might  have  found  of  interest.  Information  scientists  speak  of  the 
"relevance  ratio"  of  (1)  the  number  of  documents  of  Interest  divided  by  (2)  the  number  of 
documents  that  are  announced,  and  the  "recall  or  coverage  ratio"  of  (1)  number  of  relevant 
documents  announced  divided  by  (2)  the  total  number  of  relevant  documents  in  the  system. 

In  a  very  good  system,  you  might  find  that  75  per  cent  of  the  announcements  you  receive 
are  of  Interest,  while  these  are  perhaps  90  per  cent  of  the  relevant  documents  that  are  in 
the  input  data  file. 

Fortunately,  the  relevance  and  recall  ratios  can  both  be  raised,  although  never  to  100 
per  cent,  by  careful  attention  to  the  interest  profile.  By  tabulating  the  responses  that 
the  user  has  fed  back  into  the  system,  the  operator  can  determine  the  profiles  that  need 
Improvement.  Successive  tabulations  and  responses  to  user  questionnaires  reveal  the 
success  of  the  Improvement  effort.  Furthermore,  some  users  might  be  satisfied  with  one  of 
the  extremes  --  either  a  broad  announcement  service  giving  all  the  documents  the  user  can 
absorb,  or  a  narrow  selection  of  particularly  interesting  documents.  The  effort  required 
to  optimize  the  interest  profile  is  the  price  paid  for  not  having  to  look  at  every  single 
announcement  in  an  abstract  Journal  with  thousands  of  entries. 


6.  TREND  TOWARD  STANDARD  PROFILES 

When  one  examines  the  SDI  systems  mentioned  >  >  far,  it  is  obvious  that  they  possess  cer¬ 
tain  features  that  are  undesirable  in  the  framework  of  providing  information  service  to  very 
large  numbers  of  users.  For  one  thing,  each  new  user  enrolled  in  the  system  adds  to  the 
requirements  for  computer  time.  Depending  on  the  computer  program,  this  increase  need  nut 
be  linear  with  number  of  users;  nevertheless,  the  added  cost  and  availability  of  computer 
time  must  be  considered  in  planning  any  SDI  system  expansion.  User  turnover  can  be  high  In 
an  SDI  system  and  updating  of  the  user  profiles  is  a  constant  activity,  again  adding  to 
computer  usage.  Besides  computer  costs,  professional  assistance  in  structuring  interest 
profiles  increases  with  number  of  users.  The  effort  may  well  be  justified  in  relation  to 
the  value  of  the  SDI  service,  but  the  availability  of  professional  personnel  may  be  a 
problem. 

One  solution  is  the  "group  profile".  Identical  in  every  other  respect  to  the  individual 
interest  profile,  it  selects  announcements  lor  an  organizational  unit;  e.g. ,  a  branch  or 
section.  The  unit  has  the  responsibility  of  circulating  the  announcemrrts  so  that  all  its 
members  can  select  documents  they  wish  to  see.  The  group  profile  avoids  duplication  of 
interests  between  individual  profiles,  is  not  affected  by  personnel  turnover,  and  because 
of  its  relative  stability  can  be  improved  to  an  optimum  level  of  performance  more  readily 
than  can  the  number  of  individual  profiles  it  might  replace. 

A  second  evolutionary  development  arising  from  SDI  is  the  trend  toward  standard  topical 
profiles.  As  with  group  profiles,  the  SDI  match  and  print  programs  are  continued,  but 
instead  of  tailoring  a  profile  to  a  particular  individual’s  interests,  a  series  of  profiles 
is  written  to  select  announcements  pjcording  to  certain  topics  of  defined  scope.  The 
computer-printed  output  for  these  topics  is  r  -reduced  by  conventional  printing  processes, 
and  the  user  received  copies  of  the  particulai  topic  listings  that,  together,  best  provide 
announcements  meeting  his  specific  interests. 

The  rationale  behind  this  trend  to  topic  profiles  becomes  clear  when  we  examine  a  collec¬ 
tion  of  individuals’  SDI  profiles.  We  find  that  many  users  have  fairly  clear  Interests  in 
relatively  well  demarked  subjects;  e.g.,  aerodynamics,  supersonic  transports,  geomagnetism, 
welding,  etc.  These  subjects  can  then  be  considered  as  topics  for  which  profiles  might,  be 
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established.  Other  common  interests  can  be  determined  by  comparison  of  SDI  profiles  in  a 
type  of  factor  analysis.  The  resulting  clusters  of  terms  representing  interests  common  to 
a  number  of  user.]  can  also  be  considered  as  topic  profiles.  Although  there  is  no  real 
necessity  for  deciding  on  a  simple  title  for  such  clusters,  in  practice  a  short  subject 
title  is  chosen,  which  is  then  limited  as  to  coverage  by  a  scope  note. 

Selection  of  topics  can  be  based  on  experience  with  bibliographic  requests  by  potential 
users  of  the  current  awareness  service,  and  of  course  by  consideration  of  the  subject  con¬ 
tent  of  the  input  documents. 


7.  NASA/ SCAN  PROGRAM 

Typical  of  this  new  type  of  SDI  service  is  the  NASA/SCAN  Program  (Figs. 6, 7).  SCAN  is 
an  acroiym  for  -Selected  Current  Aerospace  /Votices.  SCAN  is  a  developmental  program  with 
limited  participation  at  present,  but  it  offers  the  possibility  of  providing  a  selective 
current  awareness  service,  not  to  the  few  hundreds  of  individuals  typical  of  an  SDI  system, 
but  to  tens  of'tnousands  of  aerospace  scientists  and  engineers.  This  is  possible  because 
SCAN  is  much  less  expensive  than  SDI  as  the  consequence  of  transferring  much  of  the  over-all 
effort  from  the  computer  operations  and  profile  refinement  activities  to  the  traditional  and 
relatively  inexpensive  operations  of  printing  and  sorting.. 

SCAN  inherits  the  great  flexibility  of  SDI  in  possessing  the  capability  of  modifying  the 
scope  of  topics  through  profile  changes  and  of  adding  or  deleting  topics  at  will.  However, 
this  flexibility  cannot  be  used  arbitrarily  in  a  system  striving  for  both  economy  and  user 
satisfaction.  Choosing  the  catalog  of  topics  to  offer  potential  users  requires  a  tradeoff 
between  a  number  of  factors:  (1)  computer  usage,  which  increases  with  the  number  of  topics; 
(2)  reproduction  and  sorting  effort,  which  increases  with  the  number  of  topics  and  number  of 
users;  (3)  user  satisfaction,  which  Increases  with  increasing  number  of  topics  as  the  user’s 
interests  can  then  be  correlated  more  closely  with  a  limited  number  of  topics..  A  decision 
on  a  particular  topic  thus  includes  consideration  of  the  number  of  users  having  common 
Interests,  the  extent  to  which  users’  specific  interests  can  be  met  by  a  finite  number  of 
topics,  the  number  of  announcements  we  desire  to  set  as  a  minimum  for  a  topic  per  issue 
output,  and  the  maximum  number  of  announcements  we  will  accept  for  a  topic  oucput.  Too  many 
announcements  force  the  user  to  spend  an  excessive  amount  of  time  reviewing  his  lists  of 
notifications,  the  solution  being  to  split  the  topic  into  more  specific  coverage. 

As  an  illustration  of  flexibility  of  the  present  NASA  SCAN  service,  topics  include 
Supersonic  Transports ,  Clear  Air  Turbulence,  and  Aircraft  Noise  and  Sonic  Boon.  The  latter 
two  copies  provide  the  user  who  has  these  very  specific  interests  with  only  the  announcements 
he  wishes  to  see,  while  the  Supersonic  Transports  topic  provides  a  much  broader  range  of 
coverage.  This  flexibility  extends  through  the  SCAN  topics,  which  can  overlap  in  coverage 
and  can  announce  the  same  document  under  a  number  of  appropriate  headings,  permitting  the 
user  to  match  his  specific  or  broad  interests  by  a  minimum  of  notification  listings. 

The  notification  listings  are  prepared  by  offset  reproduction  of  the  master  computer 
printout  and  are  then  sorted  by  the  requirements  for  numbers  of  copies  of  each  topic  as 
submitted  by  the  participating  organizations  for  their  individual  users.  Thus,  the  sorting 
effort  is  distributed,  with  the  local  participating  organization  having  the  responsibility 
for  maintaining  user  records  and  sorting  and  distributing  the  incoming  SCAN  notification 
listings. 

User  participation  in  SCAN  optimization  is  important,  but  is  not  accomplished  in  the 
same  way  as  in  SDI.  The  user  need  net  mark  every  announcement  as  to  his  Interest,  there 
being  no  provision  for  constant  feedback  as  in  ^I.  Brief  questionnaires  as  to  the 
relevance  of  announcements  and  solicitation  of  comments  on  the  desirability  of  creating 
new  topics  or  combining  several  existing  ones,  or  splitting  one  with  too  broad  coverage, 
provide  adequate  feedback  for  optimizing  the  relatively  stable  SCAN  profiles. 
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SCAN  appears  to  be  the  path  that  current  awareness  services  will  take  to  provide  service 
to  very  large  numbers  of  users.  Several  U.S.  Government  agencies  are  testing  programs  much 
like  SCAN.  Projected  costs  for  large  scale  SCAN  programs  appear  to  lie  in  the  range  of  $10 
to  $20  per  user  per  annum,  again  depending  on  the  details  of  the  service  provided. 


8.  AVAILABILITY  OF  SDI  PROGRAMS 

Practical  aspects  of  establishing  a  selective  dissemination  service  include  obtaining  the 
computer  program  to  accomplish  the  SDI  match  and  printing.  The  program  will  vary  with  the 
computer  to  be  used  and  with  the  type  of  announcements  being  designed.  Organizations  with 
the  requisite  programming  staff  and  the  computer  testing  capabilities  may  design  and  write 
their  own  program.  The  advantages  are  possible  high  efficiency  and  complete  understanding, 
obtained  from  actual  experience,  of  the  full  potentialities  of  the  program.  Programming  an 
SDI  system  can  be  a  very  large  effort,  however,  and  the  over-all  program  writing  and  testing 
can  cover  a  long  period  of  time. 

If  the  SDI  system  design  is  tied  to  an  already  existing  computer,  a  program  for  SDI 
service  may  already  have  been  written  and  be  available  from  the  computer  manufacturer  or 
from  associations  of  users  of  that  particular  computer.  While  use  of  an  existing  program 
may  restrain  the  design  of  an  SDI  system  tr  some  extent,  this  may  be  less  of  a  restraint 
in  practice  than  it  may  seem  at  first  thought.  Furthermore,  an  existing  program  may  possibly 
be  modified  more  readily  than  writing  an  original  program. 


9.  SDI  INPUT 

We  commonly  think  of  an  SDI  service  as  based  on  an  organization’s  own  document  assec-sionlng 
and  Indexing  activities.  However,  bibliographic  data  on  computer  tapes  are  increasingly 
available  in  certain  subject  areas.  Certain  organ izatio^s,  the  Engineering  Index  being  an 
example,  are  in  the  early  phases  of  such  activities,  with  tapes  on  plastics  and  electronics 
being  issued  to  a  limited  number  of  companies  on  a  contract  basis.  Among  professional 
societies,  the  American  Chemical  Society  has  advanced  to  the  stage  of  offering  a  variety  of 
index  and  bibliographic  data  on  magnetic  tape  for  general  purchase.  Commercial  information 
science  and  technology  firms  also  sell  computer  tapes  containing  bibliographic  data  covering 
various  subject  areas. 


10.  PURCHASE  OF  SDI  SERVICE 

As  an  alternative  to  in-house  operation  of  an  SDI  program,  the  purchase  of  such  a  service 
may  be  considered.  Societies  and  commercial  firms  that  sell  bibliographic  data  on  tapes 
will,  as  an  alternative,  run  the  tapes  on  their  own  computers  against  the  SDI  profiles 
supplied  by  the  customer.  At  least  one  U.S.  firm  offers  selected  references  to  a  wide 
variety  of  the  journal  and  patent  literature  based  on  a  fee  schedule.  The  customer  may 
request  announcements  of  all  current  documents  published  by  a  given  author  or  a  specified 
organization,  or  having  certain  keywords  in  the  title,  or  that  have  cited  a  previous 
reference  or  author. 


11.  SELECTIVE  DISSEMINATION  OF  DOCUMENTS  (SDD) 

An  alternative  or  adjunct  to  the  dissemination  of  bibliographic  references  is  the  selec¬ 
tive  distribution  of  the  documents  themselves  directly  to  individual  users.  Large  firms 
and  professional  societies  are  particularly  interested  in  this  means  of  bringing  reported 
research  to  the  attention  of  those  Individuals  who  can  best  make  use  of  it  and  also  in  re¬ 
ducing  the  numbers  of  copies  of  documents  that  must  be  warehoused  while  waiting  for  requests. 
Documents  can  be  matched  to  users  by  computer,  using  SDI-type  profiles,  or  by  a  topic 
distribution  like  SCAN.  Papers  of  conferences  sponsored  by  a  professional  society  or 
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internal  reports  created  within  an  organization  are  particular  candidates  for  SDD,  perhaps 
in  conjunction  with  an  SDI  announcement  service  for  external  reports.  SDD  might  also  be 
broadened  to  the  distribution  of  all  accessioned  documents.  In  this  case,  distribution  might 
be  in  the  form  of  microfiche  copies  for  economy.  While  such  a  complete  SDD  system  has  been 
proposed,  its  benefits  over  SDI  announcements  have  not  been  demonstrated  to  justify  its  added 
cost. 


12.  FUTURE  DEVELOPMENTS 

Looking  ahead  to  the  next  developments  in  selective  dissemination,  we  can  foresee  increas¬ 
ing  use  of  direct  access  to  the  computer  made  possible  by  time  sharing  and  improved  console 
and  display  devices.  The  coming  generation  of  SDI  users  may,  instead  of  receiving  a  listing 
^  of  selected  documents,  merely  sit  down  at  the  console  of  a  computer  interrogation  station, 
possibly  located  at  a  considerable  distance  from  the  computer,  and  merely  press  a  few  buttons 
to  identify  himself  and  enter  the  code  for  his  SDI  announcements.  The  announcements  selected 
since  his  last  such  request  would  be  displayed  on  the  cathode  ray  tube  screen  before  him.  By 
pressing  another  button,  while  a  certain  document  is  on  the  screen,  he  could  instruct  his 
library,  through  a  terminal  located  there,  to  send  him  a  copy  of  the  document.  NASA  has 
underway  a  continuing  study  of  the  remote  interrogation  of  large  document  data  files  known  as 
RECON  (for  remote  console).  Incorporation  of  this  SDI  capability  is  to  be  tested  in  the  near 
future. 

On-line  bibliographic  interrogation  of  the  ccxoputer  offers  exceptional  advantages  in  rapid 
improvement  of  SDI  profiles,  as  changes  can  be  made  while  the  output  from  the  previous  pro¬ 
file  is  being  studied.  The  changes  in  selected  announcements  resulting  from  profile  revisions 
can  be  called  up  from  the  data  files  innediately,  making  iterative  testing  highly  effective 
in  optimizing  profiles  in  comparison  to  the  present  limitations  caused  by  batching  responses 
and  delay  in  computer  runs. 

As  current  phases  in  SDI  development  progress,  we  may  expect  considerable  clarification  in 
the  interplay  between  (1)  SDI  as  presently  constituted  —  a  batch  process  computer  operation 
followed  by  a  printout  of  all  announcements,  (2)  I9)I  as  it  might  become  with  the  prolifera¬ 
tion  of  on-line  computer  systems  and  (3)  SCAN  as  the  archetype  of  a  system  for  distributing 
printed  announcement  lists  to  numerous  users. 

To  look  even  further  into  the  future,  we  must  take  account  of  the  rapid  advances  being 
made  in  the  capacities  and  speed  of  computers,  the  ability  of  optical  readers  to  input 
full  text  Instead  of  bibliographic  data,  the  increasing  capability  of  computers  to  organize 
raw  data,  and  the  potential  developments  in  display  and  on-line  dialogue  between  man  and  the 
computer.  The  science  and  technology  of  information  is  certain  to  advance  also,  so  that 
automatic  content  analysis  of  documents  will  beccxie  possible  to  replace  or  supplement  the 
intellectual  indexing  of  today.  Factors  of  document  significance  and  relationship  to  users’ 
interests  will  be  far  more  complex  than  today.  The  SDI  user  in  the  future  will  not  merely 
receive  a  listing  of  relevant  documents  but  will  be  alerted,  through  optical  display,  to  new 
information.  The  information  might  be  a  condensation  or  formatting,  in  graphical  form  when 
appropriate,  of  data  received  c'lrectly  from  experiments  and  that  have  not  yet  had  a  printed 
existence  outside  the  computer.  The  printed  document  will  still  exist  in  abundance,  but  the 
SDI  user  will  be  alerted  to  the  informational  content  rather  than  to  the  existence  of  the 
document  Itself.  Furthermore,  the  portions  of  new  information  will  be  presented,  by  computer 
analysis  of  relation  to  the  user’s  Interests,  in  order  of  significance  and  immediacy  of 
application. 

While  progress  may  appear  to  be  slow  at  time,  we  are  moving  toward  these  potentialities, 
the  ultimate  goal  being  the  enhancement  of  the  users  capabilities  in  advancing  science  and 
technology  by  a  true  communication  of  Information. 


The  discussion  on  this  paper  follows  on  page  149. 
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AIRCRAFT  MEASURED  OVER  FIXED  GROUND  RANGE 
•ACOUSTIC  MEASUREPENTSf  ELASTIC  WAVES*  FLIGHT  PATHS*  JET 
AIRCRAFT  NOISE*  PIRAGE  3  AIRCRAFT*  •PRESSURE  DISTRIBUTION* 

PRESSURE  RECCRCERS*  cSONlC  BOUMS*  SUPERSONIC  FLIGHT 

N6e-lS022  ^CORNELL  AERONAUTICAL  LAB.*  INC.*  BUFFALO*  n  n  a 

CAT.  11  N.  Y. 

THE  PULTI-RECOMPRESSION  HEATER*  A  NEW 
CONCEPT  FOR  LARGE  SCALF  HYPERSONIC  TESTING 
WEATHERSTON*  R.  C.  DATE-  DEC.  1967  COLL-  61  P 
Rf.rs 

CAL-AD-239C-Z-1 

THERPOCYNAPICS*  HEAT  TRANSFER*  AND  MECHANICAL 
DESIGN  OF  PULTI-RECOMPRFSSICN  HEATER  FOR 
SIPLLATICN  TESTING  OF  HYPERSONIC  VEHICLES 
•ATPCSPHERIC  ENTRY  SIMULATION*  •CONVECTIVE  HEAT  TRANSFER* 
•EXPERIMENTAL' CESIGN*  •HEATING  FCUIPMENT*  HYPERSONIC 
VEHICLES*  HYPERVELCCITY  HIND  TUNNELS*  THERMAL  ENERGY* 

THERPGCYNAMICS 

N68-15099  SUC-AVIATIUN*  PARIS  /FRANCE/.  a  D  a 

CAT.  02  CONCEPTION  OF  THE  AIRFkAME  AND 

AfcRrCYNAPIC  PROBLEMS  OF  THE  AEROBUS 
CONCEPTICN  OE  LA  CELLULE  ET  PROBLEMS 
AERfCYNAMICUES  CE  LeAIKBUS 

RCCFE,  C.  DATE-  d1967d  COLL-  23  P  LANG-  IN 
FRENCH  CONF-  PRESENTED  AT  A.F.I.T.A.E.  4TH 
CCLLOO.  CN  APPL.  AEPOOYN.*  8-10  NOV.  1967 
EURCPEAN  AEROBUS  CONFIGURATIONS*  PASSENGER 

Traffic  economics*  devflcpment  costs*  and 

AIRPORT  PLANNING 

AIR  CARGO,  vAIRCRAFT  CONFIGURATIONS*  •AIRPLANE  PRODUCTION 
CCSTS*  AIRPCPT  PLANNING*  ECONOMICS*  EUROPE*  OPERATIONAL 
PROBLEMS,  PASSENGERS*  eTRANSPORT  AIRCRAFT 
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Pig. 5  NASA  list-type,  three-part  SDI  announcement  form. 


OF  INTEREST, 
NOT  REQUESTED 


160002  *1 
*2 

29 

32 

40  A 
40  A 
40  A 
40  A 
40  A 
40  A 
40  A 
40  A 
40  A 
40  A 
40  A 
40  A 
40  A 
40  A 
40  A 
40  A 
40  A 
40  A 
40  A 
40  A 
40  A 
40  A 
40  8 
40  B 
40  6 
40  B 
40  B 
40  B 


LASER  APPLICATIONS 
SCAN  16-0002 
A023,  BOOl 
ACC*Cir,TER 
01  ANTENNAS 
01  CAMERAS 
01  COMMUNICATING 
01  COMMUNICATION  EQUIPMENT 
01  COMMUNICATION  THEORY 
01  DOPPLER  EFFECT 
01  HIGH  SPEED  CAMERAS 
01  IMAGE  TUBES 
22  LASERS 

01  PHOTOGRAPHIC  EQUIPMENT 

01  PHOTOGRAPHY 

01  RADAR 

01  RANGE  ERRORS 

01  RANGE  FINDFRS 

01  RANGEFINDiNG 

01  RECEIVERS 

01  SIGNAL  RECEPTION 

01  SIGNAL  TRANSMISSION 

01  SPACE  COMMUNICATION 

01  SPACEBORNE  PHOTOGRAPHY 

01  TELECOMMUNICATION 

01  TRANSMITTERS 

01  HOLOGRAPHY 

01  LASER  MODES 

01  LASER  OUTPUTS 

01  OPTICAL  COMMUNICATION 

01  OPTICAL  RADAR 

01  WAVE  FRONT  RECONSTRUCTION 


Pig.  6 


A  standard  NASA/SCAN  topic  profile. 
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NASA/SCAN 

Notification 


OVO*  MVOMUilC  AM  fMIMATIC  SVSTINS 
lAA  AM  STAA  ISSUCS  MS.  NAACM  IMS 


a  TAT  INCOAFOAATES  FAOVEN  TECMNISUES  -  MEN  TECMMIOCV. 
HATTNEA.  C.  A.  EUtL-  AVIATION  HEEK  AND  SFACE 
TECHNOLOGY.  VOL.  ST,  DATE-  NOV.  JO,  IV6T,  COLL-  F. 
AJ,  AS,  AT,  AS,  T2-TA. 

AIA  TAANSFOATATION,  AIACAAFT  CONTAOL.  .AIACAAFT  DESIGN, 
•CAAGO  AIACAAFT,  HYDAAULIC  EOUIFNENT,  .FASSENGEAS,  .HEIGHT 
ANALYSIS,  HING  FAOFILES  COJ  AAA-tSSlA 

■  A  tISTASlE  FNEUMTIC  FLON  TAIGGEA  /FNEUMATVCMV 
STAUMIENIOHV  FFIEAIUTNIA  OHUSTAtlLNT/. 

NICMAIOHICZ,  S.  A.  FOLSAA  AAAOENIA  NAUA,  INSTTTUT 
AUTONATVAI,  HAASAH,  FOLANO  FOIL-  FONIAAV. 
AUTONATYAA,  AONTAOLA,  VOL.  I'  GATE-  OEC.  ISAT. 

COLL-  F.  SS2-SSS.  T  AEFS.  LANS-  IN  FOIISM. 

ACTUATOAS.  .IISTASLE  CIACUITS,  .FLOH  AESISTANCE,  .FNEUNATIC 
CIACUITS,  FNEUNATIC  CONTAOL,  FNEUNATICS.  FAESSUAE  EFFECTS. 
•TAIGGEA  CIACUITS  COS  AAt-IA2A4  S 


. 


a  THaCH>OII«iNSIMU  FMf  JCTS.  ftAJAMTNAH*  Nn 
SUtAAHANVA.  K.  AltCATA,  OMT.  OF  CIVIi 
CMGlNCCftlNSi  CONONTOMt  AUEATA*  CANADA  FUAL*  ROYAL 
AERONAUTICAL  SOCIETY*  JOURNAL*  VOL«  71*  DATE-  OEC« 
lYAT*  COLL-  F*  iSt*  tSY.  7  REFS. 

•FLON  VELOCITY*  «FREE  JETS*  •HYDRAULICS*  JET  FLON*  LENGTH* 
FREOICTIONS*  •THREE  DIMENSIONAL  FLON*  VELXITY  OISTRIRUTJON 

C12  AAA-U414 

a  XM  ENVIRONf'ENTAL  CONTRX  OESICM  AND  FIRST  YEARaS 

SERVICE  EKFERieXES*  CLECVES*  V*  F.  HAXER*  H.  H. 

FERLEC*  J.  S*  XOONNELL  DOUCLAS  CMF**  OOUXAS 
AIRCRAFT  CO.«  AIRCRAFT  DtV.*  LONG  GEACN*  CALIF* 

CONF«  /AMERICAN  INST*  OF  AERONAUTICS  AND  ASTRONAUTICS* 
COMMERCIAL  AIRCRAFT  DESIGN  AND  OFERATtON  MEETING*  LOS 
ANGELES*  CALIF* *  JUN*  12-14*  1447*/  FLAC-  MEN  YORK 
FUGL-  JOURNAL  OF  AIRCRAFT*  VOL*  $*  DATE-  JAN.-FED* 
1D4R*  COLL-  F*  A4-72*  REAM-  nFOR  ADSTRKT  SEE  ISSUE 
IS*  FADE  242S*  ACCESSION  NO.  A47-?0ST4a  AIAA  FAFER 
47-407 

AIR  CONDITIONING*  «A|RCRAFT  DESIGN*  AlURAFT  RELIADILITY* 
CAIIN  ATMOSFHERES*  CONFEREXES*  aOC  D  AIRCRAFT* 

•  ENVIRONMENTAL  CONTRX*  FNEUNATIC  CONTROL*  FRESSURI2E0 

CADINSt  •SYSTEMS  EXINEERIX  COS  A40-I4402  0 

■  AFFARATUS  FM  SEMI-AUTOMATIC  XAiUMEMENT  OF  TX 
LXARITMMIC  OXREXNT  OF  FREE  VIDRATIONS  OF  GASTURDIME 
XAOES*  OfVICXNSKII*  N*  F.  FASTRITSKII*  V.  $• 
TITOV*  F.  N.  INIT-  IN-  IMTERNAL  FRICTION  IN  METALS 
AX  ALLOYS*  EDITED  DY  V*  S*  XSTNIKOV*  F*  N*  TAVA02E* 
AX  L*  K*  GXOIEXO,  TRAN-  /TRANSLATION  OF  VNUTREXEE 
TRENIE  V  NETALLARM  I  SFLAVAKM*  XSCOM*  ItOATEL^STVO 
MUKA*  1444./  FLAC-  XM  YORK*  FUDL-  COXULTANTS 
■UREAU*  OlV*  OF  FLENUM  FUDLISHIMG  CMF.*  DATE-  1447* 
COLL-  F.  211-214* 

AUTOMATIC  CONTROL*  •FREE  VIRRATION*  GAS  TURiIXS*  HYDRAULIC 
EOUIFNENT*  LKARITHMS*  •XASURIX  INSTRUMENTS*  •TURIINE 

•  LADES*  •VIRRATION  DAMF*X  C14  A4D-14422 

■  HIGH  EXRGY-RATE  FMMIX  OF  FIIROUS  CONFOSITES. 
RNINSON*  R.  X,  lATTELLE  MEMORIAL  INST.*  FACIFIC 
NORTHMEST  LARS**  RICHLAND*  MASH.  CONF-  IN- 
FIRER-STRENGTXNEO  METALLIC  CONFOSITES*  AMERICAN 
SXIETY  FW  METALS*  METALS  COXRESS*  SYXXIUM* 
CHICAGO*  ILL.*  NOV*  2*  S*  1444*  FAFERS*  MAAR-IATTO 
os-174  SFON-  SVMXSIUM  SFONSOREO  RY  the  AMERICAN 
SXIETY  FOR  TESTIX  AX  MATERIALS*  AX  THE  AMERICAN 
SXIETY  OF  XCHANICAL  EXINEERS*  RESEARCH  SFONSOREO  RV 
THE  RATTELLE  MEMORIAL  INST.  FLAC-  FHILAOELFHIA*  FA*« 
FURL-  AMERICAN  SXIETY  FOR  TESTIX  AND  MATERIALS  /ASTM 
SFECIAL  TECXICAL  FURLICATIOM  NO.  427/*  OATE-  1447* 
CXL-  F.  107-123*  S  REFS. 

ALUMINUM,  ALUMIXN  OXIDES,  •CONFOSITE  MATERIALS* 

CCNFEREXES*  •FAIRICATION*  •FORMING  TECHNIQUES*  HONEVCOX 
STRXTURES*  FNEUNATIC  EOUIFNENT*  •RElNFORCIX  FIRERS* 
STAINLESS  STEELS*  TITANIUM*  TURIINE  RLAOES 

CIS  A4R-I47TT 

«  RESEARCH  AX  KVELOFXNT  OF  OPHROARO  SYSTEMS  AMO 
ELEXNTS  FOR  AEROSFACE  VEHICLES  RESEARCH  REXRT* 

FERIOO  EXIX  30  SEF*  1447  FEXSYLVANIA  STATE 
UMIV. *  UNIVERSITY  FARR.  SHEARER*  J.  L*  OATE- 
CCT.  1447  COLL-  13  F  AfiFS  NASA-CR-41474  RR-4 

•  AEROSFACE  ENGINEERIX*  DIGITAL  COMFUTERS*  FLUID  AMFLIFIERS* 
FLUID  JETS*  FLUID  TRANSMISSION  LINES,  •FLUIDICS*  HYDRAULICS* 

FAGE  1 


OrOtr  the  documenlA  ftn  VRAt  by  cbtckliif  the 
RpproprlRte  boxFi.  Thtn  write  your  Bimt  Aid 
ifiterful  mRil  code  la  the  epRCeR  below,  ind 
foneerd  the  entire  ibeet  to  your  llbrtry. 


03-04 


HAMS 


MAIL  CODE 


MATHEMATICAL  MODELS*  •MOTORS*  FNEUNATICS,  STEF  FUNCTIONS 

CI2  N64-1346A* 

4  EXFERIMENTAL  TECHNIQUES  FOR  INFEOAXE  MEASUREMENTS  OF 
SERVXONTRXS  DE  HAVILIAX  DIV.,  HANKER  SIOOELEV 
AVIATION*  ltd.*  EOGNARE  /EXLAND/*  NAlL*  J.  C. 
INIT-  IN  ONERA  AEROELAST 1C  I  TV  MANUAL*  VOL.  4  al94Ta 
/SEE  N4I-US4I  03^-32/  COLL-  24  F 
AERODYNAMIC  LOADS*  •AERODYNAMIC  STARILITY,  AEROELAST ICITV, 
•AILERONS,  AIRCRAFT  STRXTURES*  EQUATIONS  OF  MOTION* 
HYDRAULIC  EOUIFNENT,  •IXEDAXE  XASURENENTS*  •SERVXONTROL 

C32  N6R-1446S 


Pig. 7  NASA/SCAN  notification  listing. 
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DISCUSSION 


C.O. Veruinb:  Is  the  fact  that  NASA  tries  to  do  everything  by  computer  the  reason  for  chang¬ 
ing  from  the  tailored  profile  to  the  standard  profile  for  SDI,  with  consequent  lack  of  pre¬ 
cision? 

M.S.Day:  The  central  organisation  of  NASA  must  provide  an  SDI  service  to  50,000  people  and 
the  present  facility  does  not  permit  this  to  be  done  on  the  basis  of  tailored  profiles  so 
the  idea  of  standard  profiles  has  been  developed.  Local  centres  have  accepted  the  idea 
for  economic  reasons. 


R.J. Dubon:  How  does  the  sale  of  an  £^I  service  stand  with  regard  to  the  Copyright  Laws? 

M.S.Day:  The  NASA  SDI  service  works  only  with  titles  and  these  are  not  considered  to  be 
copyright.  If  a  system  was  set  up  which  disseminated  more  information,  e.g.  abstracts,  there 
might  be  the  possibility  of  copyright  infringements,  but  NASA’s  legal  advisers  do  not  think 
that  infringements  would  in  fact  occur,  even  in  this  circumstance. 


D.Bosman:  In  contrast  to  SDI  services  where  users  receive  new  information  by  post,  the 
console-terminal  system  has  the  psychological  disadvantage  that  positive  action  is  required 
of  each  Individual  who  participates  in  the  system.  Is  this  likely  to  reduce  the  effective¬ 
ness  of  the  system? 

M.S.Day:  NASA  has  been  testing  the  use  of  remote  consoles  over  a  period  of  eighteen  months 
among  its  own  technical  staff.  The  only  problem  occurred  when  they  were  removed;  workers 
wanted  them  back  to  get  Information  which  they  could  not  obtain  in  other  ways. 


R.  Moser: 


(i)  Does  NASA  send  out  microfiche  as  well  as  hard  copy? 

(11)  Is  there  any  standardisation  between  NASA  and  the  Defense  Documentation  Center. 


M.S.Day:  (i)  NASA  does  distribute  microfiche  and  these  are  standard  with  microfiche  of  the 

Department  of  Defense  and  US  Atomic  Energy  Commission.  Microfiche  provide 
the  regular  method  of  distributing  imput  material  of  all  types  (except  copy¬ 
right  material)  to  laboratories  and  agencies.  The  cost  of  each  microfiche  is 
about  10  cents. 

(li)  About  30%  of  the  STAR  bulletin  consists  of  material  supplied  by  the  Defense 
Dociunentation  Center.  A  computer  tape  of  this  material  is  fed  directly  into 
the  NASA  system. 
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INTERACTIVE  INFORMATION  PROCESSING, 
RETRIEVAL,  AND  TRANSFER 

by 

Professor  J,B,C.  Licklider 

Massachusetts  Institute  of  Technology,  U.S.A. 
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SUMMARY 


Describes  the  present  status  and  trends  of  man-computer-iuteractive 
information  processing,  retrieval  and  transfer  made  possible  by  multi-access 
computers.  Some  of  the  promises  and  problems  of  interaction  are  examined. 
The  main  activity  in  this  field  in  the  U.S.A.  is  the  development  of  hard¬ 
ware-software  systems  and  r^ubsystems.  Examples  are  drawn  from  three  pro¬ 
jects,  MAC,  TIP,  and  Intrex  at  the  Massachusetts  Institute  of  Technology. 


153 


i  INTERACTIVE  INFORMATION  PROCESSING, 

^  RETRIEVAL,  AND  TRANSFER 

I  J.C.R.  Llcklider 

i 

\  1.  INTRODUCTION 

;  Pace-to-face  communication  among  people  in  small  groups  usually  involves  short 

sentences,  frequent  interruptions,  as  many  questions  as  declarative  statements,  and  as 
much  meta-language  as  primary  content.  Each  member  of  the  group  stimulates  the  others 
and  is  stimulated  by  them;  the  communication  is  -  in  the  current  terminology  of  on-line 
j  computing  -  “highly  interactive".  In  contrast,  a  lecture  tends  to  be  one  long  mec"''ge 

transmitted  in  one  broad  unidirectional  channel;  the  poorer  the  lecture,  the  stronger 
:  that  tendency.  An  ordinary  document,  consisting  of  passive  print  on  passive  paper,  is 

not  active  and  certainly  not  interactive  at  all.  Moreover,  the  same  is  true  of  ordinary 
f  catalogues,  Indexes,  abstracts,  and  accession  lists  and  all  the  other  traditional  aids 

i  for  finding  documents  -  except  librarians. 

Librarians  were  joined  recently  by  cesputers.  The  computers  presented  themselves  at 
^  first  as  clerical  assistants.  They  proved  that  they  could  help  4Teatly  in  handling  the 

’  routine  chores  of  the  library  and  the  document  room.  Then  they  asked  for  and  received 

‘  raises  and  promotions,  and  they  took  on  such  additional  work  as  making  KWIC  Indexes  and 

1  searching  files  of  index  terms.  But  that  was  just  to  get  acquainted.  What  the  computers 

I  really  came  to  do  is  much  more  revolutionary;  they  came  to  do  in  a  new  and  different  way 

I  what  only  brains  had  done  before  -  to  make  stored  information  interact. 

•  Of  course,  "hardware”  computers  cannot  do  such  a  thing  all  by  themselves;  they  must 

j  have  “software",  l.e. ,  progreas  and  data.  They  must  also  have  input-output  equipment 

f  through  which  they  can  interact  with  people.  And  they  must  have  human  users  who  know  how 

I  to  Interact  with  them.  Even  when  those  requirements  are  met,  present-day  computers  can- 

I  not  (for  lack  of  storage  capacity)  take  the  places  of  books  and  journals,  but  they  can 

make  available  for  interactive  use  such  less  voluminous  information  as  the  data  of  data 
'  banks  and  the  citations  contained  in  journal  references. 

I 

I  Current  experiments  with  programmed  multi-access  computers  and  computer-stored  infor- 

j  mation  are  making  it  clear  that  interaction  adds  a  very  significant  dimension  to  infor- 

I  mation  processing,  retrieval,  and  transfer.  The  purpose  of  this  paper  is  to  communicate 

I  some  of  the  spirit  and  substance  of  that  new  dimension.  The  experiments  and  experiences 

I  described  are  from  three  research  and  development  projects  at  the  Massachusetts  Institute 

j  of  Technology;  Project  MAC.  Project  TIP,  and  Project  IN1KEX.  "MAC”  stands  for  "Machlne- 

I  Aided  Cognition"  and  “Multi-Access  Computers".  "Tip"  stands  for  “Technical  Information 

!  Program".  “INTREX”  stands  for  “INformatlon  IRansfer  Experiments”. 

i 

I 

Research  and  development  efforts  in  the  field  of  interactive  information  processing, 
retrieval,  and  transfer  are  being  carried  out  also  in  many  other  institutions  in  the 
United  States  as  well  as  in  Western  Europe  and  the  U.S.S.R.  In  selecting  my  examples 
from  the  projects  at  M. I.T. ,  I  am  following  a  suggestion  made  by  the  organizing  conmittee. 


2.  INTERACTIVE  INFORMATION  PROCESSING 

By  far  the  greatest  part  of  the  experience  in  interacting  directly  with  computers  -  in 
interacting  “on  line”  and  "at  the  console",  to  use  two  of  the  favorite  terms  of  the  field  - 
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conies  not  from  experiments  In  Information  transfer  but  from  use  of  computers  in  preparing 
computer  programs  and  in  solving  scientific  and  engineering  problems.  The  earliest 
digital  computers  were  programmed  on  line,  and  there  has  always  been  a  bit  of  on-line 
programming  and  "debugging"  (elimination  of  program  errors),  but  not  until  the  advent  of 
multi-access  computing,  based  on  the  technique  of  "time  sharing",  was  it  possible  for  large 
numbers  of  people  to  work  as  a  matter  of  coarse,  day  after  day,  at  computer  consoles.  Now 
there  are  several  hundred  experienced  console  users  at  M. I.T.  and  many  times  that  number 
elsewhere.  It  is  something  of  an  extrapolation  to  go  from  interactive  programming  and 
problem  solving  to  interactive  information  retrieval  and  transfer,  but  the  experience  in 
the  former  is  the  main  guide  for  experiments  and  developments  in  the  latter. 

At  M. I.T.  there  are  several  interactive  computer  systems.  The  most  widely  used  are  two 
almost  identical  systems  using  IBM  7094  computers  and  a  supervisory  program  called  the 
"Compatible  Time-Sharing  System"  (CTSS).  Project  MAC  is  operating  a  new  system  based  on 
a  Digital  Equipment  PDP-6  computer  and  is  developing  another  new  system  that  uses  a  General 
Electric  645  computer  and  a  supervisory  program  called  "MDLTICS”.  There  are  two  inter¬ 
active  systems  in  the  Lincoln  Laboratory,  the  TX-2  computer  with  a  time-sharing  supervisor 
called  "APEX”  and  an  IBM  360/67  computer  with  a  time-sharing  supervisor  called  "CP",  and 
one  in  the  Civil  Engineering  Department  based  on  an  IBM  360/40  with  an  auxilliary  IBM 
1130.  However,  the  examples  I  shall  use  come  from  7094-(3TSS’ s^ >®. 

Connected  through  a  telephone  switchboard  to  the  7094  computers  are  about  200  consoles. 
Most  of  the  consoles  are  merely  typewriters  -  i.e. ,  teletypewriters  or  computer  typewriters. 
A  few  have  cathode-ray  displays,  typewriters,  and  auxiliary  apparatus  for  graphical  or 
manual-control  input.  A  new  console  with  storage-tube  display  and  typewriter  keyboard 
recently  made  its  appearance".  It  is  rated  higji  because  the  storage  tube  provides  rea¬ 
sonably  good  resolution  and  fast  display  of  text  and/or  drawings  yet  does  not  require 
continual  "refreshing"  to  maintain  the  picture  -  and  because  it  is  less  expensive  than 
earlier  consoles  with  cathode-ray  displays. 

The  procedure  for  beginning  a  session  at  the  computer  is  simple  and  has  been  described 
frequently  elswhere,  so  I  shall  assume  that  you  (the  user)  have  completed  the  formalities, 
have  read  the  "mail"  (messages)  sent  to  you  through  the  computer  by  your  colleagues,  and 
are  ready  to  begin  work.  For  the  sake  of  simplicity,  I  shall  assume  that  your  console  is 
a  typewriter.  Since  CTSS  is  a  general-purpose  system  with  about  a  hundred  different  sets 
of  programs  ready  to  help  you,  what  you  do  first  depends  upon  the  kind  of  work  you  have 
before  you.  The  most  trivial  thing  to  do  -  and  therefore  a  good  introductory  task  - 
is  to  write  a  memorandum.  What  the  computer  will  do  for  you  is  make  it  easy  for  you  to 
correct  typing  mistakes  and  to  effect  editorial  corrections,  and  then  type  out  a  "clean 
copy"  of  your  memo.  In  order  to  prepare  a  memorandum,  you  call  the  writing  and  editing 
program  TYPSETT®  by  typing  “typset",  and  then  you  type  the  name  you  want  to  give  the 
file  you  are  going  to  create  -  for  example,  "jsmith". 

The  computer  then  types  "W1720,2”  (which  means  "Walt  a  moment."  and  "It  is  now  2/10  of 
a  second  past  17:20  o’clock.")  and  then  "R000.2  +  000.1"  (which  means  "Beady"  and  "You 
have  thus  far  used  2/10  second  of  processor  time  and  1/10  second  of  drum-core  transfer 
time.")  and  then  “Input"  (which  means  "I  am  ready  for  input  from  you.”)  You  start  to  type 
the  memorandum: 

To:  J.R.SmittIh 

.  space 

Subjeck: 

Subject:  Plans  for  Improvement  of  planning  Strategy 

.Space 

Next  Tuesday  is  the  last  day  forlllto  submit  your  ideas 


The  character  (#)  ordinarily  means  to  erase  a  character.  The  character  (@)  ordinarily 
means  to  erase  a  line.  The  control  woid  “.space”  means  to  skip  a  line.  Your  memorandum 
therefore  stands  as; 

To:  J.R. Smith 

Subject:  Plans  for  Improvement  of  planning  Strategy 

Next  Tuesday  is  the  last  day  to  submit  your  ideas 

You  notice  that  you  forgot  to  give  the  date  and  that  you  need  to  capitalize  the  "p”  in 
“planning".  To  go  into  “edit  mode",  you  press  the  carriage  return  key  twice.  The  computer 
thereupon  types  “Edit".  You  are  supposed,  of  course,  to  know  the  control  words  (or 
characters),  only  a  few  of  which  appear  in  thi^  example.  You  type:,  “i  Date:  June  15, 

1968"  (the  “i”  meaning  “insert"),  press  the  can lage -return  key,  and  then,  for  format 
control,  type  “.space”.  Then,  realizing  that  you  want  to  centre  the  date,  you  type  "t" 

(for  “go  to  the  top  of  the  file”)  and  “.centre”  (for  “centre  the  next  line").  In  order  to 
capitalize  the  “p"  in  “planning",  you  type  “1  plan"  (for  “locate  the  character  string 
•plan’  ")  and  “c/plan/ Plan"  (which  changes  “planning”  to  “Planning")  and  press  the 
carriage-return  key  twice  to  go  back  to  the  input  mode  and  complete  your  memo. 

And  when  you  have  completed  the  typing  -  and  have  corrected  all  your  mistakes  -  you 
file  the  memo  by  typing  “file  jsmith".  The  computer  types  “W1735.1”  and  then  "R011.3  + 
008.1".  You  type  “runoff  jsmith”.  The  computer  waits  for  you  to  put  a  fresh  sheet  of 
paper  into  the  typwriter.  You  press  the  carriage-return  key.  The  computer  types  the  memo 
perfectly  at  15  characters  per  second. 

Such  are  the  mechanics.  They  are  very  convenient  if  you  want  to  prepare  a  long  paper 
and  don’t  type  well. 

2. 1  Programming 

Most  of  the  people  sitting  at  consoles  at  M.  I.T.  are  preparing  programs.  Many  of  them 
use  the  programming  language  called  “MAD"  (“Michigan  Algorithm  Decoder"),  but  language 
preferences  vary  widely.  Translators  or  interpreters  for  about  25  programming  languages 
are  available  through  CTSS.  FX)RTTIAN  and  LISP  (List  Programming  Language)  are  popular. 

Many  of  the  languages  are  special  “problem-oriented"  languages,  e.g. ,  STRESS  and  (X)(iO 
(Coordinate  Geometry)  used  in  civil  engineering  and  DYNAMO  used  in  setting  up  simulations 
involving  difference  equations. 

Topically  a  programmer  types  the  program  he  is  preparing  with  the  aid  of  an  editing 
program  already  in  the  system.  When  he  has  completed  enough  of  the  program  to  permit  a 
test,  he  files  the  completed  part,  translates  (compiles)  it,  and  sets  it  to  running.  To 
compile  a  MAD  program  named  SORTER,  for  example,  the  programmer  types  “MAD  SORTER".  When 
the  computer  reports  that  it  has  finished  the  compilation,  the  programmer  types  “LOADGO 
SORTER",  and  the  program  is  off  and  running. 

Of  course,  the  program  usually  doesn’t  work  properly  the  first  time  it  ic  tested.  The 
trick  is  -  if  you  can’t  write  a  perfect  program  on  the  first  trial  -  to  have  the 
computer  help  you  find  the  errors.  Its  help  is  most  direct  if  the  program  you  write  is 
highly  interactive.  When  you  run  it,  it  types  (or  displays,  or  does  something)  to  you 
at  frequent  intervals,  and  you  respond  to  it.  You  can  tell  when,  and  therefore  where 
(and  often  how)  it  goes  wrong,  and  as  soon  as  you  see  the  flaw,  you  can  call  the  editor, 
correct  the  flaw,  recompile  the  program,  and  test  it  again.  If  the  fault  is  obscure, 
however,  you  may  need  to  call  a  “debugging  aid”.  It  is  another  prepared  program.  It  has 
its  own  set  of  mnemonically  coded  Instructions,  but  it  adopts  the  vocabulary  you  have  used 
in  writing  your  program  and  (if  it  is  a  sophisticated  aid)  lets  you  search  for  flaws  in 
the  “source"  programming  language  instead  of  the  “object"  code  that  actually  is  executed 
by  the  computer. 

In  any  event,  it  is  a  joy  to  prepare,  test,  and  correct  programs  on  line.  When  you  get 
used  to  interactive  programming,  you  can’t  imagine  how  programmers  could  have  persisted  so 
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long  in  using  the  traditional,  primitive  ways  -  until  you  realize  that  they  may  have 
persisted  so  long  because  it  took  so  long  to  perfect  a  program. 

2. 2  Programs 

Much  of  the  programming  done  in  Project  MAC  has  been  done  to  facilitate  programming. 

Some  of  it  has  been  done  to  understand  programming  and  to  clarify  the  basic  concepts  of 
program,  data,  and  file  structures.  Some  of  it  has  been  done  to  explore  and  foster  appli¬ 
cations  of  interactive  information  processing,  many  of  which  involve  users  who  are  not, 
and  need  not  be,  familiar  with  programming.  The  result  of  the  effort  is  a  vast  system  of 
programs  -  over  a  million  words  of  public  programs  and  20  to  30  million  words  of  private 
programs  -  that  are  available  all  the  time,  day  and  night,  except  for  about  6  hours  a 
week  that  are  devoted  to  maintenance.  One  of  the  main  aims  of  Project  MAC  has  been  to 
see  what  kind  of  a  community  would  grow  up  around  a  multi-access  computer  with  such  an 
accumulated  software  resource.  Let  me  Illustrate  the  range  of  the  program  resources  with 
a  few  examples,  and  then  let  us  consider  the  nature  of  the  community. 

One  of  the  main  service  programs  is  MAP,  a  Mathematical  Assistance  Program  (Kaplow, 
Strong,  and  Brackett ‘).  It  carries  out  mathematical  operations  for  you  -  makes  trans¬ 
formations,  solves  equations.  It  handles  algebra,  trigonometry,  differential  equations, 
Pourier  and  Laplace  transforms,  and  so  on.  It  plots  graphs  in  linear  or  logarithmic 
coordinate  systems,  whichever  you  specify.  You  do  not  have  to  know  all  about  it  to  use 
it:  it  asks  you  questions  until  it  “understands"  your  problem. 

A  recently  completed  system  of  programs  (Moses’)  solves  even  quite  complex  problems  in 
symbolic  (indefinite,  non-numerical)  integration  and  does  so  about  as  well  as,  and  much 
faster  than,  a  good  human  Integrator. 

A  system  of  programs  called  ADMINS  (Pool,  Griffel,  .ind  McIntosh)  facilitates  the  pre¬ 
paration,  maintenance,  and  use  of  data  bases.  It  is  used  heavily  in  work  in  the  social 
sciences,  where  it  is  often  necessary  to  work  with  large  collections  of  fallible  or 
fragmentary  information.  ADMINS  is  designed  to  facilitate  the  transfer  from  ink-and-paper 
files  to  computer  files  without  glossing  over  irregularities  or  losing  track  of 
distinctions. 

TEACH  (Weizenbaum,  Fenichel,  and  Yochelson)  teaches  computei  programming.  It  was  used 
last  Fall  in  one  section  of  the  most  elementary  programming  course  at  M. I.T. ,  and  the 
experience  suggests  that  the  computer  can  do  most,  if  not  all,  of  the  instructing  required 
to  let  students  make  effective  use  of  computers. 

Cyrus  Levinthal^  has  developed  programs  that  display  structural  diagrams  of  complex 
molecules.  When  the  display  apparatus’  rotates  the  diagrams,  one  sees  them  as  though 
they  were  three-dimensional.  With  the  aid  of  a  light  pen,  one  can  move  or  twist  or  bend 
or  stretch  the  structures.  Given  any  configuration,  the  computer  can  calculate  the 
electrical  binding  forces  wichin  the  molecule.  Levinthal  and  the  computer  work  in  partner¬ 
ship  -  he  contributing  the  knowledge  and  the  intuition,  it  contributing  the  calculating 
power  and  the  ability  to  store  large  amounts  of  data  accurately  and  display  it  in  a  mean¬ 
ingful  pattern  -  to  solve  molecular-folding  problems  that  neither  could  begin  to  solve 
alone. 

OPS  (amazingly,  not  an  acronym)  is  a  large  system  of  programs  for  interactive,  incre¬ 
mental  simulation  and  modeling.  It  was  developed  by  Greenberger,  Jones,  Morris,  and 
Ness’°.  OPS  comes  about  as  close  as  any  programming  system  yet  developed  to  facilitating 
the  basic  process  of  thinking.  First,  it  provides  a  language  in  which  you  can  define 
objects  or  entities  and  specify  their  properties  uid  the  relations  you  think  hold  among 
them.  Second,  it  lets  you  set  into  motion  the  situation  thus  described  and  make  it  un¬ 
fold,  displaying  whatever  aspects  of  its  behavior  you  select.  Third,  it  records  the  his¬ 
tory  and  prepares  whatever  summaries  you  specify.  And,  fourth,  it  lets  you  intervene  at 
any  time,  modify  anything  you  like,  and  then  cause  the  simulation  either  to  continue  or 
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to  start  again  from  the  beginning.  In  short,  it  lets  you  organize  your  thoughts  in  a 
definite,  moldable,  dynamic  medium;  it  reveals  the  implications  of  your  static  description 
by  converting  it  into  observable,  moving  behavior;  and  it  lets  you  change  your  mind  as 
often  as  you  need  to  in  order  to  explore  the  consequences  of  alternative  assumptions  and 
conditions. 

2.3  The  On-Line  Community 

The  foregoing  paragraphs  described  a  few  of  the  hundreds  of  programs  -  enough  of  them, 

I  think,  to  convey  an  impression  of  what  is  meant  by  the  phrase  used  earlier,  “accumulated 
software  resource”.  Those  programs  that  have  been  recognized  as  the  most  generally  useful 
have  been  described  and  catalogued  more  or  less  well  and  made  available  to  all  users. 

Other  programs,  less  generally  useful  or  less  well  recognized,  reside  in  personal  files, 
bu  they  too  can  be  used  by  anyone  who  tracks  down  their  authors  and  gets  permission. 

This,  each  user  of  CTSS  can  take  advantage  in  his  own  work  of  pertinent  efforts  that  bis 
pre  -ssors  and  his  colleagues  have  made  -  instead  of  writing  again  a  hundred-times  rewritten 
pro«..  .■  he  can  take  on  new  worlds  to  conquer.  A  similar  accumulative  process  is  beginning  to 
opei  v  m  the  domain  of  data.  A  researcher  formulates  a  theory,  casts  it  into  the  form  of  a 
compL':  jr-program  model,  collects  pertinent  data,  and  tests  the  theory  by  applying  the 
model  to  them.  Someone  else  comes  up  with  an  alternative  theory,  programs  it,  links  it 
to  the  first  researcher’s  files  (with  permission,  of  course),  and  sets  the  two  models 
into  competition.  Then  others  get  interested,  construct  new  models,  modify  old  ones, 
collect  more  data.  The  theories  change  but  the  data  base  accumulates,  and  the  accumulating 
data  almost  automatically  make  the  new  tests  more  comprehensive  and  less  expensive  than 
the  old  ones. 

Through  such  computer-facilitated  human  interactions,  a  new  kind  of  research  community 
is  arising  at  M.  I.T.  It  is  of  course  only  in  an  early,  formative  stage,  and  its  shape 
cannot  yet  be  discerned  clearly,  but  there  is  little  doubt  that  something  significant  is 
happening.  The  computer  system  is  being  used  for  communication  as  well  as  for  computing. 
People  send  messages  to  one  another  through  the  system.  They  learn  about  one  another’ s 
programs  before  the  programs  are  completed,  sometimes  even  before  the  programming  has 
begun.  They  plan  together  in  order  to  maximize  mutual  value.  They  strive  for  generality 
and  compatibility.  Programs  and  data  in  the  public  files  are  beginning  to  be  regarded  as 
publications. 

A  rough  measure  of  the  Interdependence  of  the  members  of  the  CTSS  community  is  given 
by  the  ratio  of  L,  the  number  of  links  from  one  person’ s  files  to  the  files  of  others,  to 
F,  the  number  of  files  one  pe.'son  has  himself.  The  ratio  L/F  has  risen  from  near  0  to 
about  1  in  about  three  years.  With  better  documentation  and  a  more  convenient  file-link¬ 
ing  scheme,  it  may  go  to  5  or  even  10  during  the  lifetime  of  the  new  MULTICS  system. 

As  the  local  on-line  community,  centred  upon  a  single  multi -access  computer  system, 
has  emerged  from  concept  dnto  actuality,  the  idea  of  a  broader,  geographically  distributed 
community  has  taken  form  in  the  minds  of  several  people.  This  idea  Involves  inter¬ 
connecting  several  multi-access  computer  systems  and  combining  their  communities  of  users 
into  a  supercommunity.  There  are  evidences  of  interest  in  that  thought  and  some  incipient 
action  based  upon  it.  In  my  expectation,  geographically  distributed  computers  and  infor¬ 
mation  networks  will  come  into  being  during  the  next  decade.  If  they  do,  their  Impact 
upon  the  process  of  information  transfer  may  be  great. 

2. 4  Some  Conclusions  Based  on  Project  MAC  s  Experience 

During  its  five  years  of  operation.  Project  MAC  has  explored  more  areas  of  the  broad 
field  of  information  processing  than  I  can  summarize  here;  theory  of  computation,  theory 
of  automata,  programming  language  and  systems,  large  files  and  data  bases,  architecture 
and  organization  of  multi-access  systems,  graphic  processing  and  display,  modelling  and 
simulation,  console  design  and  human  factors  in  man-computer  interaction,  networks  of 
central  and  satellite  computers,  and  diverse  applications.  Let  me,  nevertheless,  attempt 
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to  state  what  I  consider  the  main  conclusions  pertinent  to  information  storage  and 
retrieval: 

(a)  Information  is  a  dynamic,  living  thing,  not  properly  to  be  confined  (though  we 
have  long  been  forced  to  confine  it  thus)  within  the  passive  pages  of  a  printed 
document . 

(b)  As  soon  as  information  is  freed  from  documental  bounds  and  allowed  to  take  on  the 
form  of  process,  the  complexity  (as  distinguished  from  the  mere  amount)  of  know¬ 
ledge  makes  itself  evident.  Everything  one  does  in  an  active  informational 
environment  is  "complexity  limited". 

(c)  Man-computer  interaction  is  the  most  hopeful  of  the  available  approaches  to  the 
mastery  of  informational  complexity. 

(d)  Even  with  the  help  of  on-line  interaction,  however,  one  man  cannot  master  a  very 
large  domain  of  information.  It  will  take  cooperative  on-line  teamwork  -  in 
tight  organizations  or  in  loose  communities,  depe'^ding  upon  the  natures  of  the 
undertakings  -  to  achieve  significant  solutions  t  the  "big”  problems  of  science, 
technology,  industries,  cities,  nations,  and  alliances. 

(e)  The  basic  thing  in  the  user's  concept  of  an  interactive  information  system  is  the 
“name  space"  of  the  filing  (i.e.,  memory  or  storage)  system. 

(f)  Although  the  term  "time  sharing”  has  achieved  wide  currency,  the  sharing  of 
processor  time  is  not  fundamentally  important..  Much  more  important  are  memory 
sharing  and  communication.  Thus,  the  aim  of  multi-access  design  should  not  be  to 
make  each  user  think  he  has  a  computer  all  to  himself;  it  should  be  to  immerse 
each  user  in  a  cooperative.  Interactive,  computer-based  community. 

(g)  The  importance  of  controlled  access  to  files  can  hardly  be  overstated.  Control 
has  two  aspects:  facilitation  of  authorized  access  and  protection  against  un¬ 
authorized  access.  Good  fundamental  design  can  foster  both,  but  little  can  be  done 
to  ameliorate  a  basically  poor  filing  system. 

(h)  The  Importance  of  fast  interaction  makes  itself  felt  when  a  problem  gets  complex. 
One  can  wait  an  hour  for  a  response  -  or  even  a  day  or  a  week  -  if  only  a  few 
packets  of  information  are  Involved  in  solving  the  problem,  but  waits  of  even  a 
few  seconds  are  prohibitive  if  thousands  of  factors  have  to  be  assessed  and  fitted 
into  a  pattern  before  a  hypothesis  can  be  tested. 

(i)  In  a  community  with  many  Interests,  the  "general-purposeness”  of  the  general-purpose 
multi-access  computer  system  has  real  meaning  and  significance.  The  system  must 
lend  itself  to  a  great  variety  of  applications  and  serve  as  ready  host  vo  diverse 
subsystems.  Generality  and  open-endedness  cost  something,  of  course,  but  Project 
MAC  s  experience  indicates  they  are  well  worth  it. 

(j)  Reliable  operation  is  vital,  and  -  since  the  reliability  will  not  be  perfect  - 
effective  “back-up”  arrangements  and  recovery  procedures  are  vital,  also.  Before 
they  will  invest  their  main  intellectual  capital  in,  or  entrust  it  to,  a  multi¬ 
access  computer  system,  people  have  to  be  confident  that  the  system  will  be 
available  when  they  want  it  and  that  it  will  not  lose  their  valuable  programs  and 
data. 


3.  INTERACTIVE  INFORMATION  RETRIEVAL 

Project  TIP^^  uses  the  facilities  of  the  TGOA-CTTSS  multi -access  systems.  The  main 
TIP  data  base  is  a  growing  collection  of  bibliographic  data,  presently  from  almost 
lot), 000  journal  papers  in  the  field  of  physics.  The  TIP  programs  are  programs  for 
processing  the  data  in  ways  formulated  by  the  user  during  his  interaction  with  the  system: 
simple  searches  for  titles  containing  specified  terms,  for  example,  or  complex  explorations 
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involving  sorting,  merging,  comparison  of  citations,  progressive  definition  of  retrieval 
specifications,  and  so  on.  The  programs  now  in  use  (occasionally  by  many  people  through¬ 
out  the  Institute  and  intensively  by  about  20  devotees  at  M. I.T.  and  elsewhere)  are  the 
third  revision  of  a  system  developed  over  about  six  years.  They  are  applicable  to 
diverse  files  and  have  been  used  on  personal  and  fiscal  as  well  as  on  bibliographic  files. 
NEWTIP,  now  in  a  late  stage  of  development,  will  extend  the  domain  of  the  searching  and 
processing  tools  to  a  still  wider  class  of  data  bases. 

Because  TIP  is  so  pertinent  to  the  Interest  of  this  Symposium,  I  shall  give  a  couple 
of  examples  from  typical  TIP  sessions  at  the  console.  Using  TIP,  you  type  in  lower  case 
and  the  computer  types  back  to  you  in  all  capitals.  To  explain  what  some  of  the  abbrevia¬ 
tions  mean,  I  shall  insert  comments  in  parentheses.  First,  from  the  TIP  User's  Manual 
a  very  simple  example: 

tip  (You  type  “tip"  to  evoke  the  TIP  program. ) 

W1019.5  (Wait.  It  is  10:19  and  a  half.) 

TYPE  YOUR  REQUESTS. 

search  annals  of  physics  v.26  to  v.28 

find  title  pion  not  author  boyling  j.b.  (Find  all  articles  with  titles  containing 
“pion"  except  those  by  J.B. Boyling,  whose  work  you  already  know.) 
output  print  title  a  1  and  1  (One-letter  abbreviations  are  adequate  for  TIP  words,  such 
as  "author",  "identification",  and  "location".  You  could  just  as  well  have  typed 
"0  p  t  a  i  1"  to  instruct  TIP  to  type  as  output  the  specified  information  about  the 
items  found. ) 

go  (Go  to  work,  TIP. ) 

ANNAr..S  OF  PHYSICS 
VOLUME  26 
VOLUME  27 
J384  V027  P0079 

DEUTERON  PHOTODISINTEGRATION  AND  N-P  CAPTTJPJ:  BELOW  PION 
PRODUCTION  THRESHOLD 
PARTOVI  F. 

CAMBRIDGE,  MASSACHUSETTS 

MASSACHUSETTS  INSTITUTE  OF  TECHNOLOGY 

LAB0RAT(»Y  FOR  NUCLEAR  SCIENCE  AND  PHYSICS  DEPARTMENT 

VOLUME  28 
J384  V028  P0034 

ANALYSIS  OP  THE  PHOTOPRODUCTION  OF  POSITIVE  PIONS 
HOHLER  G. 

SCHMIDT  W. 

GERMANY 

TECHNISCHE  HOCHSCHULE  KARLSRUHE 
INSTITUT  THEORETISCHE  KERNPHYSIK 

(No  article  meeting  the  specification  was  found  in  volume  26.  One  was  found  in  volume  27. 
One  was  found  in  volume  28.  "J384"  stands  for  "ANNALS  OF  PHYSICS".  If  you  ask  for  "pion" 

you  will  find  "pions",  also  -  but  not  if  you  ask  for  "pion*".  Several  other  devices 
similar  to  that  use  of  the  asterisk  may  be  employed  in  specifying  the  type  of  match.) 

Now,  suppose  you  know  two  articles  on  a  subject  in  which  you  are  interested.  They  are 
in  Physical  Review,  volume  135,  and  they  begin  on  pages  247  and  582.  You  want  to  see 
what  else  there  is  in  that  volume  that  Is  closely  related.  You  use  Kessler’s  technique 
of  "bibliographic  coupling" 

search  phyrev  135 

f  share  b  phyrev  135  247  (Find  articles  that  share  at  least  one  bibliographic  citation 
with  the  article  identified  here. . . ) 

f  share  b  phyrev  135  582  (...or  with  the  article  identified  here.  Putting  the  “f's  on 
separate  lines  signifies  "or". ) 


0  p  1  t  a  linkage  (Output  by  printing  the  identification,  title,  and  authors  of  each  shared 
citation  and  also  the  identifications  of  the  article(s)  with  which  the  citation  was 
shared.  "Identification”  means  “Journal,  volume,  and  page”.) 


PHYSICAL  REVIEW 

VOLUME  135 

JOOl  V135  P0247 

ELECTRON  SPIN  DOUBLE  RESONANCE  STUDIES  OF  F  CENTERS  IN  KCL.  I 
MORAN  P.  R. 

SHARED  LINKAGE  TO  PHYREV  V135  P02<i7 

JOOl  V070  P0460  JOOl  V091  P1071  JOOl  V098  P1787 

JOOl  V102  P0151  JOOl  VllO  P0630  JOOl  V114  P1245 

JOOl  V115  P1506  JOOl  V118  P1024  JOOl  V124  P0442 

JOll  V022  P0989  JOll  V026  P1124  JOll  V029  P1692 

J030  V026  P0167  J031  V032  P0775  J046  V005  P0183 

J052  V008  P0299 

JOOl  V135  P0316 

DOUBLE-RESONANCE  PHENOMENA  IN  THE  GASEOUS  LASER 
CULSHAW  W 

SHARED  LINKAGE  TO  PHYREV  V135  P0247 

JOOl  V070  P0460  J030  V026  P0167 

SHARED  LINKAGE  TO  PHYREV  V135  P0582 

JOOl  V076  P0833  JOOl  V107  P1559  J096  V229  P1213 

JOOl  V135  P0470 

SPIN-LATTICE  RELAXATION  OF  F  CENTERS  IN  KCL.  ISOLATED  F  CENTERS 
FELDMAN  0.  W. 

WARREN  R.  W. 

CASTLE  J.  G.,  JR. 

SHARED  LINKAGE  TO  PHYREV  V135  P0247 
JOOl  V091  P1071 

.JOOl  V135  P0582 

SPIN  RELAXATION  OF  OPTICALLY  PUMPED  CESIUM 
FRANZ  F.  A. 

LUSCHER  E« 

SHARED  LINKAGE  TO  PHYREV  V135  P0S82 

JOOl  V076  P0833  JOOl  V098  P0478  JOOl  VIOS  P1487 

JOOl  V107  P1559  JOOl  V108  P1453  JOOl  V115  P0850 

JOOl  V123  P0544  JOOl  V132  P0712  J003  V067  P08S3 

J012  V036  P0135  J012  V037  P2504  J017  V031  P0986 

J031  V034  P0589  J034  V176  P0045  JOSS  VOll  P0255 

J041  VOOl  P0052  J041  VOOl  PC054  J041  V005  P0373 

J045  V047  P0460  J046  V003  P0009  J046  VOOS  P0372 

J046  VOOS  P0009  J046  VOOS  P0529  J046  V009  POOH 

J049  V007  P0277  J052  V049  P0127  J074  V028  P0646 

J096  V229  P1213  J096  V241  P0865  J096  V246  P1522 

J096  V254  P3829  J256  V004  P0177  J273  V006  P1148 


$ 
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JOOl  V135  P0591 

STUDY  OF  SPIN-EXCHANGE  COLLISIONS  IN  VAPORS  OF  RB85^  RB87,  AND 

CS133  BY  PARAMAGNETIC  RESONANCE 
MOOS  H.  WARREN 
SANDS  RICHARD  H. 

SHARED  LINKAGE  TO  PHYREV  V135  P0247 
JOOl  V070  P0460 

SHARED  LINKAGE  TO  PHYREV  V135  P0582 
J041  VOOl  P0054 

JOOl  V135  P0727 

LINE  SHAPES  OF  PARAMAGNETIC  RESONANCES  OF  CHROMIUM  IN  RUBY 
GRANT  W.  J.  C, 

STRANDBERG  M.  W.  P. 

SHARED  LINKAGE  TO  PHYREV  V135  P0247 
JOOl  V091  P1071 

JOOl  V135  P1046 

FAST-PASSAGE  EFFECTS  IN  THE  NUCLEAR  MAGNETIC  RESONANCE  OF  FE57 

IN  PURE  IRON  METAL 
COWAN  DAVID  L. 

ANDERSON  L.  WILMER 

SHARED  LINKAGE  TO  PHYREV  V135  P0247 
JOOl  V091  P1071 

JOOl  V135  P1068 

SPIN-LATTICE  RELAXATION  IN  FREE-RADICAL  COMPLEXES 
KRISHNAJI 
MISRA  B.  N. 

SHARED  LINKAGE  TO  PHYREV  V135  P0247 
JOOl  V070  P0460 

JOOl  V135  P1099 

LOW-FIELD  RELAXATION  AND  THE  STUDY  OF  ULTRASLOW  ATOMIC  MOTIONS 

BY  MAGNETIC  RESONANCE 
SLICHTER  CHARLES  P, 

AILION  DAVID 

SHARED  LINKAGE  TO  PHYREV  V135  P0247 
JOOl  V098  P1787 

JOOl  V135  P1498 

FORCED  TWO-LEVEL  OSCILLATOR 
SENITZKY  I.  R. 

SHARED  LINKAGE  TO  PHYREV  V135  P0247 
JOOl  V070  P0460 

JOOl  V135  P1622 

LATTICE  SUM  EVALUATIONS  OF  RUBY  SPECTRAL  PARAMETERS 
ARTMAN  J.  0. 

MURPHY  JOHN  C. 

SHARED  LINKAGE  TO  PHYREV  VI 35  P0582 
J046  V008  P0529 
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As  the  bibliographic  data  are  being  typed,  you  note  that  two  of  the  coupled  articles 
shared  many  citations  with  one,  but  none  with  the  other,  of  the  source  articles.  You 
note,  indeed,  that  there  were  very  few  doubly  shared  citations.  Picking  out  the  two 
articles  that  did  share  citations  with  both  source  articles,  you  call  the  library  to  see 
whether  or  not  they  are  available.  As  you  wait  for  the  connection,  you  decide  to  convert 
the  journal -volume-page  identification  into  author -and -title  citations,  and,  as  a  first 
step,  you  Isolate  and  save  the  snared  items  by  typing: 

f  share  b  jl  135  247  and  share  b  jl  135  582 
(Putting  both  on  the  same  line  would  have  set  up  the  "and”  relation  even  if  you  had 
not  specified  "and"  explicitly.) 

o  save  all  (As  the  output  action,  save  the  data  in  a  personal  file.) 

name  save  file  resona  blbllo  (Name  the  personal  file  "resona  bibllo". ) 

Prom  that  point  you  would  ask  TIP  to  print  out  the  authors  and  titles  of  the  articles  in 
resona  blblio,  and  then  you  would  go  on  to  explore  other  ideas. 

Pven  though  the  foregoing  examples  exercised  oniy  a  few  of  the  TIP  commands,  they 
probably  provided  enough  "scenario”  to  convey  a  notion  of  how  one  works  with  TIP.  The 
TIP  commands  may  be  combined  in  many  different  patterns.  Users  develop  ingenious 
strategies  for  filtering  out  Irrelevant  articles  without  losing  the  ones  they  want. 

Having  found  a  good  set-jrch  pattern,  however  complex,  one  simply  names  it  and  applies  it 
thereafter  by  merely  typing  the  name.  Or,  having  found  a  particularly  rich  collection  of 
references,  he  preserves  them  in  a  personal  file  for  future  use.  If  there  is  too  much 
output  for  the  typewriter  to  handle,  he  stores  the  output  in  a  personal  file  and  requests 
that  it  be  printed  on  a  fast  off-line  printer  and  delivered  by  messenger  or  mail.  If  the 
TIP  data  base  does  not  contain  all  the  material  he  wants  to  use,  there  is  a  way  for  him 
to  create  TIP-processible  files  of  his  own. 

To  understand  TIP  as  an  experiment,  one  must  shift  from  the  user’s  to  the  designer’s 
point  of  view..  The  designer  can  create  retrieval  tools  to  the  limit  of  his  imagination, 
but  he  must  make  empirical  tests  to  find  out  what  works  and  what  does  not..  The  tests  have 
to  Involve  “real”  users.  It  is  essential  to  record  and  analyze  what  the  real  users  do. 

While  the  TIP  programs  that  we  have  described  make  searches  and  print  lists,  other  TIP 
programs  take  notes.  They  record  the  identity  of  each  user  and,  in  chronological  order, 
the  name  of  each  TIP  command  he  Issues  and  a  summary  statement  of  its  result.  At  any 
time,  a  TIP  user  can  type  a  complaint  or  a  praiseful  comment  and  be  sure  that  what  he  says 
is  recorded  in  the  computer  store  for  Dr. Kessler’s  benefit.  The  data  thus  collected  are 
periodically  analyzed,  and  modifications  and  adjustments  are  continually  mads.  Indeed, 

TIP  has  developed  through  a  process  of  guided  evolution,  and  NEWTIP  can  perhaps  be  regarded 
as  a  guided  mutation. 

While  TIPs  users  sit  at  consoles  under  programmed  experimental  scrutiny,  they  do  sub¬ 
stantial  work.  A  book^^^^  and  two  review  articles^'®' have  been  based  on  TIP 
literature  searches,  and  TIP  is  being  used  in  several  studies  of  the  flow  and  dynamics  of 
the  scientific  literature.  The  TIP  programs  were  used  to  prepare  and  print  a  catalogue 
of  the  books  in  the  Student  Centre  Library,  and  the  programs  are  being  used  to  collect, 
process,  and  prepare  for  publication  a  catalogue  of  the  journal  and  periodical  holdings 
of  the  M. I.T.  libraries.  Informal  studies  were  recently  undertaken  with  the  American 
Institute  of  Physics  and  the  National  Library  of  Medicine  to  explore  problems  of 
operational  as  distinguished  from  experimental  application. 

Thus,  at  least  some  of  the  value  of  on-line  interaction  in  Information  retrieval  is 
being  exploited.  But  the  real  value  is  yet  to  come.  Most  of  the  work  done  thus  far  with 
TIP  has  been  severely  hampered  by  the  slow  pace  of  typewritten  output,  even  when  the  type¬ 
writer  is  driven  at  its  top  speed  by  the  computer.  (One  of  Brown’s^"  searches,  for  which 
the  computer  required  a  scant  4  minutes  of  processing  time,  took  2  hours  and  20  minutes 
of  output  typing. )  Now  we  are  looking  forward  eagerly  to  cathode-ray  displays.  They 
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will  make  it  possible  for  the  first  time  to  explore  fully  the  possibilities  of  interactive 
information  retrieval. 


4.  INTERACTIVE  INFORMATION  TRANSFER 

Whereas  Projects  MAC  and  TIP  are  old  enough  to  be  working  on  second-generation  systems, 
Project  INTREX  is  too  young  to  have  completed  its  first.'  In  discussing  interactive  infor¬ 
mation  transfer,  therefore,  I  shall  have  to  describe  aims  and  aspirations  rather  than 
completed  experiments  or  cu-rent  experiences^^** 

The  purpose  of  Project  INTREX  is  to  conduct  experiments  that  will  clarify  design 
objectives,  methods,  and  techniques  for  information-transfer  systems  of  about  1975. 

Einphasis  is  placed  on  the  word  "experiments".  The  project  is  working  on  a  "system",  it 
is  true,  but  the  system  is  being  created  to  support  the  experiments.  Experiments  have 
been  planned in  four  main  areas: 

(a)  bibliogiaphic  access, 

(b)  physical  access, 

(c)  fact  retrieval,  and 

(d)  network  integration. 

Thus  far  the  project  has  concentrated  on  the  first  and  second  areas,  progress  in  which  I 
shall  describe  briefly.  When  work  is  undertaken  in  the  third  area.  It  will  deal  with 
computer -program  methods  of  deriving,  from  formal  data  representations  aro'/or  from  natural- 
language  text,  definite  answers  to  specific  questions.  In  the  fourth  area,  the  aim  will 
be  to  interrelate  M.I.T.’s  computer-facilitated  information-transfer  system  with  systems 
in  other  universities,  in  government,  and  in  Industry.  But  obviously  the  third  and  fourth 
areas  will  not  become  critical  until  worn  in  the  first  and  second  areas  has  come  near  to 
fruition. 

4.1  Bibliographic  Access 

The  purpose  of  the  part  of  an  information-transfer  system  that  deals  with  "bibliographic 
access”,  of  course,  is  to  take  the  user  from  the  stage  in  which  he  has  only  a  nebulous 
idea  of  what  he  wants  to  the  stage  in  which  be  holds  in  bis  hand  the  accession  numbers 
(or  equivalent  identifiers)  of  the  documents  that  will  satisfy  his  (now  more  sharply 
defined)  informational  requirements.  Most  of  the  INTREX  effort  towards  that  end  is  centred 
upon  the  concept  of  the  computer-based  "augmented  catalogue".  It  is  computer-based  in 
that  it  resides  within  the  store  of  a  multi-access  computer  and  is  interrogated  from 
consoles.  It  is  augmented  in  that  it  contains  much  more  information  about  each  document 
that  does  the  traditional  caid  catalogue,  and  also  in  that  it  deals  with  journal  articles, 
theses,  and  reports  as  well  as  books. 

Members  of  Project  INTREX  are  preparing  a  computer-processible  catalogue  (data  base) 
that  will  contain  about  50  "fields”  of  Information  about  each  of  approximately  10,000 
documents  in  materials  science  and  engineering.  The  50  fields  include  all  the  conventional 
bibliographic  d.ata,  such  as  author(s),  title,  affillation(s),  abstract,  and  key  words  or 
descriptors.  Beyond  those  conventional  data,  the  50  include  such  things  as  a  description 
of  the  intended  audience,  an  estimate  of  the  level  of  difficulty,  and  "feedback"  comments 
submitted  by  knowledgeable  users.  The  reason  for  including  so  ma.iy  kinds  of  Information 
is  not  that  anyone  is  sure  they  will  all  be  helpful;  it  is  to  determine  which  ones 
actually  are  helpfu)  enough  to  warrant  inclusion  in  a  future  operatione.'  systet,. 

The  10,000  documents  are  being  carefully  selecttu  to  cover  areas  in  which  research  is 
especially  active  at  M. I.T.  and  in  which  there  are  researchers  who  will  contribute  to  the 
planned  experiments.  Much  of  the  selection  is  being  done  by  the  research  people  themselves. 
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The  first  experiments  will  be  made  with  one  of  the  7094-CTSS  multi-access  computer 
systems,  but  with  special  consoles  designed,  and  now  being  constructed,  in  the  Electronic 
Systems  Laboratory,  where  much  of  the  INTREX  research  and  development  Is  being  carried 
out.  The  initial  system  of  catalogue -processing  programs  has  been  completed,  and  a  more 
advanced  system  Is  being  prepared  for  use  with  the  consoles,  perhaps  as  early  as  this 
coming  summer.  Consoles  will  be  located  in  the  Materials  Science  and  Engineering  Centre 
and  the  Engineering  Library,  and  the  experiments  will  be  conducted  within  the  context  of 
actual  use. 

4. 2  Physical  Access 

Blnliographlc  access  must  of  course  lead  directly  to  physical  access,  to  actual 
possession  of  the  required  substantive  (as  distinguished  from  bibliographic)  information. 
Ideally,  the  substantive  Information  would  be  stored  in  and  delivered  through  the  computer 
system  that  handled  the  bibliographic  information  -  and  would  be  available  to  the 
computer’s  processor  for  analysis  and  transformation.  Limitations  of  the  present 
technology,  however,  make  digital  storage  and  processing  of  a  library-sized  corpus  un¬ 
economic.  The  course  being  followed  by  Project  INTREX  is  therefore  to  hold  the  sub¬ 
stantive  documents  themselves  in  a  non-digital  microform  storage  system  associated  with  the 
computer,  and  to  use  the  computer  -  in  which,  we  now  assume,  the  bibliographic 
identifications  of  the  required  documents  have  already  been  specified  -  to  pick  out  the 
identified  documents  and  to  execute  their  delivery  to  the  user. 

Accordingly,  images  of  the  pages  of  the  10,000  documents  are  being  made  in  microfiche, 
and  a  computer-controlled  subsystem  for  picking  out  and  scanning  selected  pages  is  being 
constructed.  The  electrical  signals  derived  by  the  scanner  will  be  transmitted  through 
coaxial  cables  to  consoles  in  the  Centre  for  Materials  Science  and  Engineering  and  there 
restored  to  the  form  of  a  readable  image,  either  “soft  copy”  (ephemeral)  or  "hard  copy” 
(permanent).  Equipment  for  several  of  the  operations  involved  has  been  purchased  or  built 
and  is  ready  for  test.  Equipment  for  the  other  operations  is  under  procurement  or 
development.  The  over-all  design  provides  for  rapid,  guaranteed  physical  access  to  any 
document  selected  through  the  bibliographic-access  system  -  and  for  delivery  of  that 
document  dj~ectly  to  the  location  from  which  the  retrieval  operation  was  initiated. 

Plans  call  for  experimental  investigation  of  such  interrelated  factors  as  the  speed, 
the  form,  the  resolution,  and  the  cost  of  physical  access.  The  experimental  equipment 
will  provide  for  fast  delivery  of  sharp  images,  either  hard  or  soft,  but  in  some  of  the 
tests  controlled  delays  will  be  introduced  and  the  resolution  of  the  reconstructed  images 
will  intentionally  be  degraded.  By  varying  the  parameters  and  making  measurements  of 
preference  and  performance  under  conditions  of  actual  use,  optimal  engineering  compromises 
will  be  approached  and  design  objectives  for  operational  systems  will  be  formulated. 


5.  ADVANCED  EXPERIMENTS  IN  A  LIBRARY  CONTEXT 

Looking  beyond  the  experiments  with  the  10,000-item  collection  in  materials  science 
and  engineering.  Project  INTREX  is  conducting  deaign  studies  that  postulate  a  corpus  of  a 
million  documents.  At  the  same  time,  the  M. I.T.  Engineering  Library  is  being  reconstructed 
in  such  a  way  as  to  provide  for  simultaneous  operations  in  conventional  and  computer-based 
modes.  Card  catalogues,  book  stacks,  reading  tables,  microform  equipment,  and  computer 
consoles  will  be  brought  together  in  an  arrangement  designed  for  advanced  experiments  in 
an  operational  library  setting. 


6.  SYNTHESIS  AND  PROSPECT 

Throughout  MAC,  TIP,  and  INTREX,  and  Indeed  throughout  H. I.T. ,  there  is  a  feeling  that 
a  great  and  fundamental  change  is  taking  place  in  the  way  men  relate  to  Information.  The 
change  has  not  yet  progressed  very  far.  Its  effects  are  not  yet  pronounced.  Nevertheless, 
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one  can  see  signs  enough  to  tell  that  the  change  involves  the  rules  by  which  the  game  is 
played. 

The  force  behind  the  change  is  tb"^  computer,  of  course,  but  it  is  not  the  same  computer 
we  have  known  these  last  20  years.  It  is  not  ^he  lightning  calculator,  not  the  indefati¬ 
gable  clerk.  It  is  the  comput  cast  as  the  mouldable  and  retentive,  yet  dynamic 
medium  -  the  medium  within  wo>>h  one  can  create  and  preserve  the  most  complex  and  subtle 
patterns  and  through  which  he  can  make  those  patterns  operate  (as  programs)  upon  other 
patterns  (data)  derived  from  nature  or  the  works  of  other  men. 

To  almost  every  Imaginative  mind  that  has  sensed  the  power  of  the  computer  as  an  inter¬ 
active  medium,  it  is  obvious  that  it  will  change  the  very  nature  of  libraries  and  infor¬ 
mation  systems  in  the  yearn  to  come.  At  the  same  time,  it  is  clear  that  those  years  will 
be  many  -  for  many  years  of  exploring  and  experimenting,  many  years  of  programming  and 
debugging,  and  many  years  of  developing  and  testing  stand  between  us  and  the  effective 
harnessing  of  the  computer’ s  power. 

For  the  next  few  years,  mainly  because  of  the  limitations  of  memory  capacity  that  have 
been  mentioned,  we  shall  have  to  be  satisfied  -  in  library  applications  -  with  direct, 
on-line  interaction  with  bibliographic  information,  followed  by  old-fashioned  reading  of 
substantive  contents.  I  think  that  interactive  bibliographic  searches  will  prove 
effective,  even  in  relation  to  their  cost,  in  locating  pertinent  substantive  information. 
On  the  other  hand,  I  think  they  will  fall  far  short  of  solving  the  basic  problem  of 
information  transfer. 

The  basic  problem  arises,  I  believe,  after  a  person  has  the  required  documents  on  his 
desk.  It  arises  when  he  tries  to  transfer  the  substance  of  those  documents  across  what 
West  Churchman  has  called  the  "brain-desk  barrier”.  That  is  when  a  person  really  needs 
the  help  of  the  computer. 

To  solve  the  fundamental  problem  it  is  necessary  to  make  significant  advances  in  the 
representation  of  knowledge  and  in  the  processing  of  languages,  both  natural  and  formal. 

It  is  necessary  to  convert  substantive  as  well  as  bibliographic  information  into  computer- 
processible  form  and  to  stope  it  in,  a’zd  interact  with  it  through,  sophisticatedly 
programmed  computers.  We  shall  doubtless  not  solve  the  fundamental  problem  in  the  near 
future,  but  at  last  we  can  work  on  it.;  In  my  opinion,  it  is  the  problem  that  deserves 
our  best  and  greacest  efforts. 
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DISCUSSION 

D.  Bosnian;  What  is  the  intellectual  level  of  the  people  who  use  the  MAC  system? 

J.C.  R.  Licklider;  The  system  is  used  a  lot  by  professors,  research  students  and  graduates 
on  the  academic  side  and  also  by  others  such  as  deans  and  administrative  staff  who  find 
it  useful  for  record-keeping..  Only  a  few  undergraduates  have  access  to  the  system. 


S. Skoumal:  What  are  the  difficulties  about  MAC  which  make  it  unattractive  for  commercial 
exploitation  by  private  operators? 

J. C. R.  Licklider:  The  present  MAC  system  operates  on  out-of-date  hardware  which  makes  it 
inefficient,  but  a  commercial  firm  could  almost  certainly  develop  an  efficient  system  and 
sell  its  services. 

The  main  problem  is  that  it  takes  a  very  long  time  to  develop  the  software  for  the  system 
and  during  this  time  the  efficiency  of  the  hardware  available  will  have  improved  very 
considerably  so  that  one  tends  to  be  operating  always  on  outdated  hardware. 


I.  Gabelman:  What  is  the  status  of  the  MULTICS  system’ 

J. C.  R. Licklider;  MULTICS  (Multiplexed  Information  and  Computation  System)  operates  on  a 
GEC  645  computer  with  associated  software.  At  present  it  operates  too  slowly  and  current 
work  is  aimed  at  speeding  it  up.  The  system  has  already  been  under  development  for  about 
three  years  so  that  by  the  time  it  reaches  the  level  of  operation  of  the  CTSS  (Compatible 
Tiire-Sharing  System)  the  hardware  will  be  out-of-date. 


T. Einsele;  Is  the  MULTICS  system  designed  for  a  special  group  of  users’ 

J.C. R. Licklider:  We  are  developing  the  protocol  for  deciding  on  a  community  of  users. 
The  system  will  be  available  to  various  small  computers  each  of  which  will  serve  as  the 
interface  with  the  MULTICS  system. 


R. Stark:  Is  there  an  index  of  the  programs  used  in  Project  MAC  and  can  this  be  made 
available  to  those  interested’ 

J.C. R. Lickl ider:  There  is  a  program  manual  which  contains  text  of  all  the  programs  and 
this  could  be  made  available  at  the  cost  of  copying  but  it  would  be  very  difficult  to 
utilise  as  about  twenty-five  different  programming  languages  have  been  used. 
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SUMMARY 


The  problem  of  the  man-machine  Interface  is  traced  back  to  the  time 
when  the  first  computers  were  designed.  In  /  'ercoming  the  problems  of 
the  interface  the  cathode-ray  tube  display  is  of  prime  importance.  Using 
the  display  screen  it  is  possible  to  transmit  almost  instantaneously  to 
man  alphanumeric  text,  black  and  white  shading,  scaled  shading,  coloured 
pictures  and  moving  pictures.  The  mathematical  theory  of  automata  is 
opening  up  new  ways  of  examining  the  problems  in  a  formal  manner.  Solu¬ 
tion  to  many  of  the  problems  at  the  interface,  however,  still  awaits 
better  knowledge  of  how  information  processing  takes  place  within  the 
human  nervous  system. 
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MAN-MACHINE-INTERPACE 
W. Handler 


1.  INTRODUCTION 

The  subject  “Man-Machine-Interface*'  is  a  very  broad  one  and  some  of  its  aspects  cannot 
be  regarded  here.  Much  has  been  written  on  the  more  linguistic  or  symbolization  point  of 
view  elsewhere.  I  shall  restrict  myself  to  some  general  questions  about  the  Interface 
which  seem  to  be  important  in  my  opinion. 

There  is  a  very  interesting  novel  by  Samuel  Butler  written  in  1872.  The  name  of  this 
novel  is  “Erewhon”,  which  can  be  read  almCst  inverted  as  “nowhere”.  The,  book  tells  of  a 
traveller  who  discovers  a  strange  country  far  behind  some  mountains.  He  becomes  acquainted 
with  the  natives  there  and  is  surprised  to  find  deserted  railways  and  other  evidence  of 
decayed  technological  equipment.  None  of  it  is  in  operation.  The  original  technological 
world  seems  to  have  died  out.  When  people  notice  a  watch  belonging  to  the  hero  of  our 
story  they  accuse  him  of  violating  their  laws.  The  reason  is  that  the  people  had  a  highly 
developed  technology  long  before  the  arrival  of  our  informant.  One  day  the  machines  (we 
would  say  to-day:  the  automata)  rebelled.  They  had  become  conscious  of  their  existence  and 
fought  against  men.  Finally  the  people  succeeded  in  defeating  the  machines  by  an  extreme 
effort.  From  this  day  on  all  technology  had  been  banned. 

Let  me  mention  only  that  our  hero  finally  escaped  alive. 

I  have  told  you  this  story  only  in  order  to  make  clear  that  the  problems  of  relating  man 
and  machine  seem  to  have  come  up  previously  in  our  century.  We  now  realize  that  the  inter¬ 
dependence  of  man  and  machine  can  be  very  close.  Indeed,  I  fully  agree  with  Dr.Licklider 
and  other  scientists  of  MIT  who  speak  of  an  emerging  man-computer  community  which  may 
eventuplly  embrace  a  vast  network  of  different  compute: s  and  ouman  users. 

However,  the  question  as  to  whether  the  machine  has  a  consciousness  or  tenders  any 
friendly  feelings  towards  us  is  fortunately  fading  out  of  the  discussion.  We  are  using 
the  machine  and  especially  computers  in  a  strictly  pragmatic  way  whereby  we  are  possibly 
questioning  our  social  and  moral  responsibility  with  respect  to  its  use. 


2.  HISTORY  OF  MAN-COMPUTER  RELATIONS 

The  first  computers  designed  more  than  20  years  ago  were  very  much  characterized  by  the 
notion  of  entirely  predetermined  computational  processes  or  algorithms  set  off  by  a  single 
starting  signal.  Some  years  later  the  computer  was  equipped  with  additional  switches,  among 
them  so-called  selector  switches,  for  guiding  the  course  of  the  program  along  alternative 
pathways. 

At  first  the  inventors  themselves  who  were  intimately  tied  to  their  creation  operated  the 
system.  Soon  other  people  came  to  use  the  machine  and  worked  on  the  basis  of  a  start-button 
philosophy. 


Frequently  there  resulted  a  feeling  of  utter  resignation  in  the  face  of  the  microsecond. 
The  user  could  hope  to  trace  computational  processes  only  in  a  very  crude,  overall  manner, 
perhaps  merely  with  regard  to  the  beginning  and  and  of  his  Job.  However,  note-worthy 
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efforts  were  made  repeatedly  In  an  attempt  at  re-establishing  a  somewhat  closer  contact  with 
the  ever  faster  growing  computer. 

With  further  Increase  in  speed  it  became  necessary  to  exclude  direct  user  operation  in 
favour  of  batch-processing  thus  saving  costly  computer  time  previously  wasted  on  tedious 
input-output  procedures.  Trained  people  undertook  the  task  of  empirical  scheduling.  In 
this  manner  the  problem  of  man-machine  Interface  found  a  somewhat  fictitious  solution;  the 
real  user  rarely  came  to  see  the  computer  or  press  any  of  its  buttons.  Man-machine-interface 
was  restricted  to  a  rather  small  group  of  trained  personnel. 

Consequently  user  and  computer  were  alienated  from  each  other,  a  small  group  of 
technicians  -  the  operators  -  providing  their  sole  common  tie  and  most  flexible  but  entirely 
human  interface  because  man  is  still  the  most  adjustable  and  adaptive  Instrument. 

The  disadvantage  of  the  method  described  results  from  missing  the  rather  vital  experience 
of  observing  the  machine  undertake  your  assignment.  If  you  are  kept  at  a  distance  from  the 
computer  you  may  well  take  weeks  to  accomplish  the  objectives  which  you  might  achieve  in 
minutes  by  direct  contact  with  the  machine. 

This  kind  of  disproportion  can  be  corrected  today  by  allotting  a  substantial  number  of 
consoles  to  various  permanent  customers  in  a  time-sharing  configuration. 


3.  PICTURE-REPRESENTATIONS  OFFERED  BY  THE  COMPUTER 

Purely  typewritten  or  teletyped  communication  is  only  one  of  the  present  possibilities 
and,  having  originated  in  a  pre-computer  age,  not  necessarily  the  most  efficient  one. 

Prime  importance  perhaps  should  be  attached  to  the  development  -isplays  of  the 
cathode-ray-tube  type  which  permit  almost  instantaneous  transmission  of  mixed-mode  text  or 
picture  information  eliminating  tedious  waiting  fur  carriage  return  or  line  feed  motions 
of  printer  or  teletypewriter.  Via  display-screen  the  computer  is  able  to  offer  alter¬ 
natives  to  be  chosen  effortlessly  and  swiftly  with  the  aid  of  a  lightpen,  for  instance.  The 
same  is  true  for  the  Rand-tablet. 

In  the  past  most  transitions  to  advanced  techniques  have  been  marked  by  their  initial 
tendency  to  simulate  already  existing  achievements  using  the  new  methods.  But  when  we 
employ  the  display  consoles  mentioned  we  will  have  to  do  some  fundamental  rethinking. 
Evidently  the  flow  of  information  out  of  the  computer  can  be  speeded  up  well  beyond  the 
limits  hither-to  known.  We  must  ask  for  the  maximal  amount  of  information  per  unit  of  time 
and  the  optimal  manner  of  presentation  'best  suited  for  processing  by  the  human  user. 

There  are  several  distinct  categories  of  representation,  such  as  alpharumeric  text,  black 
and  white  shading,  scaled  shading,  coloured  pictures,  moving  pictures,  3-dimensional  dis¬ 
plays.  Those  representations  may  occur  separately  or  in  different  combinations  with  each 
other. 

Determination  of  suitable  quantities  or  qualities  of  information  is  exceedingly  difficult. 
Many  partially  subjective  factors  are  involved.  Our  judgement  will  also  depend  on  the  par¬ 
ticular  topic  under  consideration.  Graphic  representations  may  be  of  insignificant  value 
when  applied  to  codfs  of  law  or  administrative  regulations.  Alphanumeric  text  on  the  other 
hand  is  not  sufficient  for  describing  processes  of  industrial  production  or  technical  design. 
If  you  Interpret  the  list  above  as  ordered  according  to  quality  it  may  be  valid  perhaps  in 
a  majority  of  cases  but  certainly  not  in  general. 


4.  THE  FACTOR  “TIME" 

Another  important  aspect  must  not  be  disregarded  Up  to  now  most  display  applications 
have  been  of  a  rather  static  nature  in  spite  of  some  rudimentary  attempts  at  making 
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computer-controlled  cine-films.  Usually  considerations  have  centred  around  the  effects  of 
single  pictures  rather  than  taking  account  of  the  complete  sequence  of  textual  and  graphic 
Information  as  a  totality  meant  to  establish  dynamic  contact  between  user  and  machine. 

Lately  a  great  deal  has  teen  said  about  lengthy  waiting  times  leading  to  disturbed  con¬ 
tact  and  distracting  the  user  from  his  task.  Conversely  an  excessive  reaction  speed  of  the 
computer  may  “overfeed"  the  user  and  cause  In  him  what  the  psychologists  call  a  "mental 
block”. 

The  user  tacitly  assumes  the  computer  expects  reactions  without  delay  and  seems  to  feel 
pressed  for  time.  Reasonable  programming  for  time-sharing  systems  should  avoid  creating 
this  feeling. 

So  far  I  have  mentioned  a  number  of  psychological  factors  Influencing  man-machine  inter¬ 
face.  It  seems  to  be  clear  that  other  more  physiological  factors  also  influence  the  effec¬ 
tiveness  of  man-machine- interrelations. 

I  remember  the  endeavour  of  my  young  colleagues  to  program  the  GOHOKU-Game  or  "Five  in 
a  Row"  In  the  sense  of  an  optimal  CRT-dialogue  between  man  and  computer  as  the  two  partners 
of  the  game.  After  the  computer  has  lost  a  game  It  retires  for  a  certain  period  of  time 

e.g.  some  seconds  or  even  one  minute  In  order  to  learn  from  the  situation.  But  In  all 

intervals  requiring  merely  positioning  a  piece  the  computer  does  it  within  one  millisecond. 

The  time  the  computer  needs  to  calculate  its  decision  is  usually  negligible  when  compared 
to  human  reaction  speeds.  In  most  cases  man  is  not  even  able  to  observe  what  the  computer 
does.  Therefore  he  may  not  perceive  which  one  of  sometimes  many  pieces  the  computer  has  put 
on  the  board  (l.e.  the  CRT-screen).  In  contrast  the  human  partner  will  normally  handle  the 
llghtpen  only  after  a  delay  due  to  deliberation  in  order  to  place  the  next  piece.  So  the 
human  reaction  requires  some  seconds  or  even  some  minutes.  This  interval  would  be  important 

for  an  observing  human  opponent.  It  is  generally  useless  to  the  computer. 

Let  us  return  to  the  computer  projecting  its  pieces  onto  the  screen  at  its  own  fast  pace. 
In  this  case  as  in  other  cases  I  think  we  must  correct  the  behaviour  of  the  computer. 

It  is  not  always  necessary  to  alter  the  computer’ s  timing  behaviour.  Instead  of  this  we 
can  cause  the  computer  to  mark  the  last  piece  (Pig. 1)  by  using  a  third  sort  of  symbol 
("last  piece"  of  the  computer).  This  provision  substitutes  the  tiiiical  slow  behaviour  of  a 
human  partner. 

Here  is  an  example  in  which  a  symbolic  representation  must  compensate  human  incapability 
to  observe  certain  millisecond-events.  Perhaps  we  should  be  able  to  formulate  a  law  of 
interchanging  time  and  space.  In  the  example  above  space  took  the  place  of  time  by  the 
introduction  of  this  additional  third  type  of  symbol  shown. 

We  see  that  in  contrast  to  the  man-man-interface  there  exists  an  entirely  unsymmetric 
situation  with  respect  to  man-mrehine  interface.  We  can  assume  in  accordance  with  the 
theory  of  evolution  that  human  beings  naturally  are  best  adapted  to  man-man-interface, 
which  is  highly  symmetric. 


5.  THE  UNBALANCED  MAN-MACHINE-INTERFACE 

Man-machine  interface  is  unsymmetric  both  with  respect  to  quantity  and  with  respect  to 
quality  of  the  information  processed,  Man  is  able  to  deal  with  a  highly  complex  visual 
supply  of  lO’  bits  per  second  (Pig. 2,  the  diagram  originated  by  Prof.Keldel.  Unlversitat 
Erlangen-Numberg).  Human  beings  process  this  amount  of  information  in  a  highly  effective 
manner.  As  far  as  we  know  to-day  there  is  a  hierarchy  in  our  nervous  system  filtering  the 
information  by  a  principle  of  reinforcement.  This  filtering  process  results  in  a  residual 
flow  of  only  10^  bits  per  second  into  our  consciousness. 
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Preceding  this  region  of  consciousness  we  have  at  least  three  other  levels  in  which  the 
nervous  system  branches  the  information  to  motor  systems  in  order  to  supply  the  "subroutines" 
there  with  parameters.  Some  of  these  subroutines  can  be  simulated  like  the  so-called  con¬ 
ditioned  reflex,  an  unskilled  or  built-in  program.  But  other  more  complex  programs  -  those 
more  skilled  -  are  not  well  understood  yet.  There  are  some  hypotheses  like  the  “perceptron" 
which  are  suited  to  shed  some  light  on  cognition  or  data  reduction  in  the  human  organism. 

Leaving  the  input  to  man  and  turning  to  his  output,  our  investigations  seem  to  show  that 
man  is  able  to  produce  information  at  the  rate  of  nearly  lo’  bits  per  second  including 
gesticular  and  other  motorlal  transmission.  These  reactions  are  initiated  within  the  region 
of  consciousness  at  the  rate  of  presumably  10^  bits  per  second  as  described  in  Pig. 2.  Then 
certain  subroutines  are  triggered  level  by  level  and  are  supplied  with  parameters  at  the 
same  time  in  the  way  Indicated. 

Man  normally  has  only  poor  ability  to  manipulate  figures  exactly  without  recording  those 
figures  on  a  slip  of  paper.  This  process  requires  repeated  outputs  and  thereafter  inputs  to 
an.  It  is  in  their  adaptness  for  record-keeping  and  performing  arithmetic  manipulations 
that  computers  generally  excel  human  capacity. 

In  contrast,  man  has  the  ability  to  evaluate  intertwined  complex  structures  qualitatively, 
sometimes  perhaps  in  spite  of  the  fact  that  he  has  never  met  with  comparable  phenomena 
before.  It  looks  rather  Improbable  that  computers  will  ever  acquire  similar  facilities  for 
qualitative  appraisal. 


On  the  other  hand  the  computer  can  offer  an  amount  of  coded  or  printed  information,  which 
can  never  be  read  by  a  single  person.  Also  the  computer  is  able  to  produce  program- 
controlled  moving  pictures  using  its  enormous  calculating  power  for  creating  kinds  of  out¬ 
put  man  can  never  realise  without  computer  aid.  As  an  illustration  of  what  I  have  in  mind 
you  may  take  the  snapshot  (Pig. 3)  of  a  game  played  on  a  PDP-7  display  unit.  The  game  was 
invented  at  Cambridge  University  and  simulates  a  sort  of  "naval  battle"  in  which  the 
opponents  attempt  to  hit  each  other’s  "ships"  (circular  bright  objects  appearing  to  move  in 
a  viscous  fluid)  with  "rockets"  (sparks  of  light). 

We  have  mentioned  above  that  man  may  pour  out  up  to  10^  bits  of  information  per  second. 
This  performance  might  be  exceeded  by  the  computer  some  day.  The  figure  10 ’  however  is  not 
a  very  realistic  value  for  man  when  applied  to  lingual  and  grammatical  formulations  in  order 
to  express  effectively  some  of  his  thoughts.  Therefore  it  seems  to  be  more  reasonable  to 
assume  the  proper  figure  of  man’s  output  around  10^  bits  per  second  at  most. 

On  the  other  hand  no  computer  to-day  is  able  to  catch  any  T.V.-like  pictures  instant¬ 
aneously  as  we  do.  Our  experience  with  pattern  recognition  facilities  is  not  of  the  sort 
we  need  for  the  serious  use  of  such  features  by  a  computer  aside  from  some  special  applica¬ 
tions. 

To-.i»-,,  the  computer  still  requires  coded  or  at  least  prepared  information  within  the 
limits  of  rather  strongly  restrictive  conventions. 


6.  ARE  THERE  SUGGESTIONS  FOR  IMPROVING  MAN-MACHINE- INTERFACE? 

The  subject  of  this  meeting  "Storage  and  Retrieval  of  Information"  suggests  that  we  try 
to  compare  man’ s  ability  in  this  field  with  the  ability  of  the  computer.  We  must  acknowledge 
man’ s  great  superiority  over  the  computer  in  the  whole  area  with  the  exception  perhaps  of  his 
weakness  in  memorizing  long  sequences  of  numbers.  Scarcely  do  we  get  any  suggestion  today 
at  all  as  to  how  we  should  really  proceed  with  data  reduction,  storage  and  retrieval  of  in- 
form;;tion  in  a  manner  analogous  to  the  methods  employed  by  living  organisms.  We  have  not 
succeeded  in  finding  the  enormously  effective  principle  of  nature.  At  present  -  for 
example  -  there  seems  to  be  no  possibility  of  localizing  the  content  of  human  memory.  The 
same  applies  to  the  processes  of  association  and  of  generalization  in  the  human  brain. 
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These  factors  certainly  influence  raan-machine-interface,  but  they  are  not  part  of  man- 
machine-interface,  in  a  certain  sense. 

Let  me  summarize  the  most  Important  facts  as  I  see  them  for  improving  man-machine-inter- 
face. 

(a)  As  in  supervisory  programs  we  should  install  open-ended  modular  blocks  of  prefabri¬ 
cated  processes.  These  processes  must  be  supplied  with  parameters  automatically 
after  having  been  started  (similar  to  the  process  in  the  organism). 

(b)  The  predominant  idea  of  an  algorithm  which  has  to  be  started  and  then  has  to  run  off 
without  taking  care  of  any  signals  and  parameters  from  outside  should  disappear  in 
favour  of  an  adaptive  conception  in  programming.  The  compu  er  would  be  subject  to 
an  Internal  process  of  evolution. 

(c)  Apart  from  other  Important  questions  such  as  the  associative  memory  we  should  in¬ 
vestigate  eventually  the  possibilities  of  connecting  nerve  fibres  of  man  with  the 
computer  directly  in  order  to  initiate  complex  subroutines  in  critical  cases  when 
prompt  responses  are  required.  Related  important  applications  come  up  in  medicine. 

I  should  like  to  remark  finally  that  the  problems  in  my  opinion  are  mainly  not  of  a 
conventionally  technological  kind.  Rather  we  must  understand  more  completely  the  structure 
of  organic  behaviour  and  mental  processes. 


7.  SOME  REMARKS  ABOUT  AUTOMATA  THEORY 

The  question  arises  whether  there  are  tools  suited  for  dealing  with  man-machine-interface 
in  a  more  theoretical  way.  For  instance  the  IVERSON-Language  (A.P.L. )  like  other  languages 
has  the  disadvantage  of  essentially  disregarding  the  state  of  the  automaton. 

In  my  opinion  a  rigid  and  formal  access  to  the  subject  is  opened  by  the  mathematical 
theoiy  of  automata.  Here  the  user  is  considered  to  be  a  finite  automaton  in  the  sense  of 
a  working  hypothesis.  Finite  automata  are  capable  of  occupying  a  finite  number  of  states. 

On  reception  of  an  input  symbol  they  pass  in  a  well-defined  manner  (from  the  deterministic 
or  stochastic  point  of  view)  from  one  state  to  enother  emitting  an  output  symbol.  Present- 
day  theories  however  are  still  Inadequate  to  deal  successfully  with  the  formidably  large 
number  of  states  actually  existing  in  both  the  computer  and  the  human  organism. 

Further  evolution  of  Automata  Theory  will  certainly  Improve  upon  this  deficiency.  There 
will  be  the  possibility  of  treating  very  complex  structures  with  the  aid  of  algebraic 
concepts  like  homomorphlsms,  groups  of  automorph)~ms  and  so  on. 

The  subject  man-machine- Interface  is  at  present  far  from  being  formalized  in  a  way 
familiar  to  me.  But  nevertheless  some  advantages  of  other  applications  of  the  Theory  of 
Automata  encourage  us  to  make  efforts  in  this  direction. 

Within  the  framework  of  present-day  theory  the  automaton  for  Instance  is  called  upon  to 
establish  equivalence  classes  of  patterns  (e.g.  the  class  of  all  syntactically  correct 
sentences  of  a  language)  or  -  less  ambitiously  -  to  make  decisions  regarding  the  membership 
of  a  given  pattern  to  one  of  those  classes.  Another  part  of  the  theory  is  devoted  to  the 
design  of  a  suitable  series  of  experiments  for  determining  the  initial  state  of  an  automaton. 
These  Investigations  can  and  should  be  extended  to  the  reciprocal  effects  of  two  automata  on 
each  other  -  their  dialogue  -  and  to  the  mutual  interaction  of  many  automata  -  their 
communication,  or  even  their  social  behaviour. 

The  axioms  and  conclusions  of  mathematical  theory  will  depend  on  the  fundamental  concepts 
underlying  our  comprehension  of  man-machine  interface. 


176 


The  other  view  can  be  called  monolithic  and  teleologlc  In  a  way.  Man  and  machine  are 
mas°teJlng  the'  emr"c^enT ^  - 

I  is  unfortunately  subject  to  severe  practical  limitations  because  -  as 

Lace'S^h!f?hI  hn  n  i"^o™ation  processing  takes 

place  Hriuhln  the  human  nervovs  system. 

toce  deeper  Insights  into  these  mechanisrs  have  been  gained  we  staid  a  good  chance  of 
designing  very  effective  man-machine  Interfaces  on  the  basis  of  the  concept  iL  meSioned. 


DISCUSSION 


S.  Skounal :  is  the  research 
shown  by  industry? 


in  this  area  carried  out  only  at  universities  or  is  some  inteiest 


■.Handler;  At  present  very  little  work  is  done  and  then 
of  industry  would  be  welcor  ‘d. 


only  in  universities. 


The  interest 


P.Holzberger:,  Have  any  experiments  bee  '  done  coupling  human  nerve  fibres  with  computeisv 

co-operation  between  the  physiologist  and  the 
mathematician  this  experiment  has  not  been  attempted  as  far  as  I  know. 


J.P.».ittle:  Are  any  investigations  being  made  into  the  possibilities  of 
directly  or  indirj*ctly  into  the  computer. 


feeding  speech 


■.Handler;  I  am  not  aware 
processing  of  spe ion  would 
computer. 


of  any  investigations.  Almost  certainly  some  sort  of  pre 
be  required  before  signals  could  be  passed  to  the  central 
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SUMMARY 


The  educational  reQulrements  for  users  and  suppliers  of  scientific  and 
technical  information  and  the  steps  taken  to  provide  professional  education 
for  various  levels  of  attainment  are  discussed.  Mention  is  made  of  specific 
efforts  made  in  the  U.K.,  to  provide  undergraduate  and  postgraduate  training 
ior  users.  The  types  of  training  courses  available  to  suppliers  -  chartered 
librarians,  library  assistants  .and  information  scientists  -  are  outlined.. 
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EDUCATION 
F. Liebesny 


1.  INTRODUCTION 

The  very  magnitude  of  the  problems  connected  with  the  Information  explosion  is  such 
that  several  approaches  have  to  be  made  in  the  attempts  towards  overcoming  them,  Whether 
we  use  such  almost  meaningless  figures  as 

(a)  that  the  world’s  output  of  scientific  and  technical  articles  is  of  the  order  of 
1,000,000  per  annum; 

(b)  that  the  number  of  periodical  titles  in  the  disciplines  of  science  and  technology 
increases  by  two  every  day; 

(c)  that  about  half  of  the  world’ s  literature  in  those  self  same  disciplines  is  written  in 
languages  other  than  English  (which  for  the  purposes  of  this  argument  includes  American 
English);  or 

(d)  that  the  periodical  literature  increases  at  a  compound  rate  of  approximately  6-7%  per 
year, 

we  are  still  left  with  the  almost  irrefutable  fact  that  all  of  us,  users  and  suppliers 
alike,  are  gradually  becoming  submerged  by  this  flood  of  paper.  Although  the  argument  of 
quality  of  the  literature  is  being  ignored  in  such  discussions,  it  is  obviously  becoming 
increasingly  difficult  to  cope  with  the  output  of  the  world’s  presses.  This  tendency  - 
which  is  certainly  not  new  -  has  led  over  the  years  towards  more  and  more  specialization, 
both  in  the  user  and  supplier  of  the  specialist  literature.  This  has  led  on  the  one  hand 
to  the  emergence  of  the  information  scientist,  and  on  the  other  hand  to,  albeit,  isolated 
endeavours  to  train  the  user  in  some  of  the  more  elementary  forms  of  the  techniques  employed 
in  coping  with  the  literature. 

The  term  ‘information  science'  -  though  of  comparatively  recent  origin  -  has  acquired 
in  this  s.lort  space  of  time  several  shades  of  meaning.  Therefore  in  order  to  define  the 
way  in  which  the  term  will  be  used  in  this  paper,  the  activities  embraced  by  the  definition 
as  given  ;.n  the  Articles  of  the  Institute  of  Information  Scientists  should  provide  some 
guidance.  These  are: 

(i)  abstiscting.  reviewing  progress  and  other  similar  technical  writing; 

(ii)  translating  scientific  and  technical  writings; 

(iii)  editing  such  writings  as  emerge  from  (1)  and  (2)  above; 

(iv)  indexing,  subject  classification  and  retrieval  of  scientific  and  technical 
information; 

(V)  searching  scientific  and  technical  literature,  preparing  bibliographies,  reports, 
etc. , 

(vi)  providi'ig  scientific  and  technical  information  and  tendering  advice  thereon; 

(vii)  dissemination  ot  information  and  liaison  and  field  work  for  that  purpose; 

(viii)  research  on  problems  in  information  work. 


182 


It  is  obvious  from  this  recital  that  a  professional  information  scientist  should  combine 
a  considerable  knowledge  and  skill  for  the  proper  execution  of  his  work;  normally  this 
knowledge  is  obtained  from  courses  of  study  while  the  skill  should  be  derived  from 
appropriate  practical  experience  such  as  work  in  an  information  department.  In  order  to 
attain  a  corporate  membership  in  the  Institute  of  Information  Scientists  it  is  thus  necessary 
to  provide  evidence  of  both  the  required  knowledge  and  skill;  thus  for  Membership  a  candidate 
would  normally  be  expected  to  possess  a  science  degree  and  to  have  worked  at  least  five  years 
in  an  information  department. 

In  training  the  user  to  enable  him  to  deal  competently  with  the  documentation  of  his 
special  subject  field  it  would  be  unwise  to  aim  at  such  a  high  degree  of  professionalism; 
firstly,  a  detailed  training  programme  would  turn  the  user  into  an  information  scientist  and 
thus  divert  him  from  his  own  special  field;  secondly,  it  would  be  wasteful  to  impart  knowledge 
and  expertise  of  which  a  considerable  amount  would  never  be  required  by  a  specialist  user 
of  the  literature  since  the  full  training  of  a  documentalist  involves  matters  relating  to 
several  disciplines;  and  thirdly,,  the  time  generally  available  for  such  training  of  the  user 
is  not  sufficient  for  more  than  a  somewhat  superficial  approach  to  the  many  problems  of 
information  storage  and  retrieval.; 

Therefore  the  few  attempts  that  have  been  made  to  familiarize  the  scientist  and 
technologist  with  proper  means  of  using  his  subject  literature  require  careful  study  to 
elicit  their  useful  and  successful  features.  The  results  of  these  courses  are,  however,  not 
quite  so  easy  to  establish  as  most  of  them  have  only  been  conducted  for  a  few  years  and 
it  is  thus  still  too  early  to  quantitatively  assess  this  criterion. 

Before  entering  into  a  more  detailed  discussion  of  the  courses  available  it  must  be 
stated  -  perhaps  somewhat  shamefacedly  -  that  information  on  such  educational  ventures  is 
not  very  easy  to  come  by;  it  appears  that  the  discipline  of  information  science  is  not  too 
well  provided  with  means  of  keeping  its  practitioners  informed  on  what  is  going  on  else¬ 
where.,  Although  there  are  many  periodicals  and  even  abstract  journals  in  the  fields  of 
librarianship,  documentation,  information  science  etc.  their  coverage  on  the  educational 
front  is  not  too  extensive.  It  was  for  that  reason  that  the  Federation  Internationale  de 
Documentation  (FID),  the  (British)  Office  for  Scientific  and  Technical  Information,  Aslib 
and  the  Institute  of  Information  Scientists  organized  in  1967  an  International  Conference 
on  Education  for  Scientific  Information  Work.,  The  proceedings  of  this  conference  were 
published  in  September  1967  by  FID  and  its  33  papers  tried  to  survey  the  international  scene 
by  focusing  on  the  activities  in  the  most  important  countries.  That  this  attempt  was  not 
100%  successful  can  be  deduced  from  the  fact  that  there  were  no  contributions  from  the  USSR 
and  that  the  Indian  delegates  did  not  attend  the  conference  and  thus  take  part  in  the 
discussions.  However,  these  Conference  Proceedings  seem  to  constitute  the  most  comprehensive 
review  of  the  activities  in  educating  use^s  and  suppliers  of  scientific  information. 


2.  TRAINING  OF  USER 

The  user’s  training  can  be  initiated  at  two  levels:  either  before  he  graduates  or 
afterwards. 

Training  at  the  undergraduate  stage  is  frequently  very  difficult  because  most  of  the 
syllabi  are  so  crowded  that  is  requires  considerable  persuasion  of  the  university  authorities 
to  devote  any  of  that  precious  time  to  such  peripheral  subjects  as  documentation.  Further¬ 
more,  there  is  still  a  great  deal  of  that  old  belief  that  every  scientist  knows -or  at  least 
should  know  -  the  literature  of  his  own  subject.  This  fallacious  attitude  ignores  completely 
any  subsequent  developments  in  the  subject  field  and  its  literature  or  the  use  of  modern 
techniques  in  dealing  with  it. 

Training  at  the  post-graduate  stage,  on  the  other  hand,  is  likely  to  ensure  that  the 
recipient  is  in  the  proper  frame  of  mind  to  accept  the  training  as  by  then  he  is  more 
mature  and  more  aware  of  his  needs  with  respect  to  documentation. 
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(a)  Undergraduate  training:.  In  the  United  Kingdom  organized  training  of  students  is 
conducted  at  the  Universities  of  Liverpool  and  Bradford*  where  the  university 
authorities  took  the  initiative,  and  at  a  series  of  six  universities^  (Edinburgh, 
London  (University  College,  and  Chelsea  College  of  Science  and  Technology),  Oxford, 
Warwick,  and  York)  where  the  Office  for  Scientific  and  Technical  Information  is 
arranging  for  some  500  students  to  receive  an  information  service  and  some  instruction 
in  its  use. 

(b)  Postgraduate  training:  the  main  centre  for  education  in  the  use  of  literature  at 
that  level  is  undoubtedly  the  National  Lending  Library  for  Science  and  Technology, 

at  Boston  Spa,  Yorks,  where  courses  lasting  about  10  working  days  have  been  run  since 
1965.  A  specimen  timetable  for  such  a  course  on  the  use  of  scientific  literature 
is  given  in  Appendix  I. 

A  more  recent  development  is  now  being  supported  by  OSTI  whereby  retrospective 
searching  of  the  medical  literature  by  means  of  the  MEDLARS  technique  (MEDical 
Mterature  toalysis  and  Retrieval  ^stem)  will  be  made  easier  by  providing  facilities 
for  consultation  with  specially  trained  liaison  officers.:  Five  such  officers  -  at 
Newcastle,  London,  Edinburgh  and  two  yet  to  be  appointed  -  will  help  users  in  the 
formulation  of  search  profiles  and  will  advise  on  availability  and  capacity  of  this 
computer-tape  index.  Courses  are  also  being  held  at  the  Hatfield  College  of 
Technology. 

The  above  analysis  is  certainly  not  meant  to  convey  the  impression  that  there  is  only 
one  type  of  user;  indeed  there  are  many  different  users  who  have,  however,  one  common 
feature,  viz.  an  innate  reluctance  -  or  even  inability  -  to  communicate  their  real  needs 
for  information  to  the  potential  supplier.  It  is  therefore  desirable  that  any  training 
of  the  user  should  provide  for  some  means  of  tuition  in  this  delicate  art  of  stating  his 
actual  requirements  and  not  of  hiding  them  beiiind  surmises  as  to  the  possible  location  of 
the  needed  information. 


3.  TRAINING  OF  SUPPLIER 

The  education  of  the  librarian,  information  scientist,  documentalist,  etc.  is  perhaps 
somewhat  better  organized  than  that  of  the  user.  Nevertheless,  there  are  today  so  many 
different  avenues  of  training  towards  producing  a  qualified  information  scientist  that 
this  plethora  of  facilities  may  create  an  impression  of  diffusion  and  even  confusion.  In 
order  to  reduce  this  seemingly  unmanageable  pile  of  information  into  some  state  of  order 
it  may  be  advisable  to  classify  these  data  according  to  the  type  of  supplier  to  be  produced: 

3. 1  Librarian: 

(a)  Chartered:  organized  full-time  courses  of  two  years  duration  are  provided  at  several 
library  schools  which  are  housed  within  colleges  of  further  education,  polytechnics 
or  universities. 

Recent  developments  towards  creating  a  degree  in  librarianship  (as  opposed  to  a 
post-graduate  qualification  as  is  provided  at  University  College,  London)  under  the 
aegis  of  the  Council  for  National  Academic  Awards  have  led  to  the  setting  up  of 
such  a  course  at  Newcastle  to  commence  in  the  autumn  of  1968.  Other  such  courses 

are  likely  to  commence  with  the  following  academic  year. 

(b)  Assistant:  it  was  felt  some  time  ago  that  a  qualification  of  a  lower  level  of 

competence  than  that  of  the  chartered  librarian  should  be  provided  for  those  people 
having  to  interrupt  their  professional  life  -  especially  married  women  -  or  those 
not  wishing  to  attain  the  higher  profesriional  status.  Towards  this  end,  a  Library 
Assistants  Certificate  has  been  proposed;  this  scheme  will  be  administered  by  the 
City  and  Guilds  Institute  of  London. 
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3.2  Information  Scientist: 

Since  1961  a  post-graduate  course  has  been  run  at  the  Cicy  University  in  London  (formerly 
the  Northampton  College  of  Advanced  Technology).  This  two-year  course  is  run  twice  weekly 
(two  hours  each)  to  enable  students  to  attain  an  entrance  qualification  for  the  corporate 
membership  of  the  Institute  of  Information  Scientists.  A  post-graduate  one-year  full-time 
course  was  started  in  1963  which,  since  1967,  will  lead  to  a  M.Sc.  degree. 

Similar  courses  have  been  running  at  the  University  of  Sheffield  Postgraduate  School  of 
Librarianshin  and  Information  Science  since  1964..  At  present  this  one-year  course  leads 
to  a  Diploma,  but  from  October  1968  will  lead  either  to  a  M.Sc.  degree  in  Information 
Studies  or  in  Librarianship. 

Owing  to  the  previously  mentioned  fact  that  about  half  of  the  world’s  literary  output 
by  scientists  and  technologists  is  written  in  languages  other  than  English  and  that  many 
of  the  readers  of  such  writings  are  only  too  rarely  qualified  to  comprehend  those  foreign 
languages,  it  is  obviously  impor’-ant  that  the  suppliers  of  information  should  possess  some 
proficiency  in  handling  foreign  language  material.  Therefore  many  of  these  courses  lay 
considerable  stress  on  such  ability  and  even  include  some  training  in  languages  in  their 
syllabus.  In  the  M.Sc.  course  at  the  City  University  one  examination  paper  is  devoted  to 
testing  the  required  level  of  proficiency. 

While  the  foregoing  survey  has  been  directed  largely  to  activities  and  developments 
in  the  United  Kingdom  it  should  not  be  thought  that  the  rest  of  the  world  is  standing  still.. 
Indeed,  as  R.T. Bottle^  has  shown,  similar  schemes  for  training  the  user  are  in  operation 
or  being  planned  in  many  countries  of  Europe  (to  which  his  survey  was  confined)  and  in 
Australia**.; 

In  the  user-supplier  dialogue  which  forms  the  theme  of  this  symposium  it  is  essential 
that  each  party  should  understand  and  appreciate  the  basic  requirements  of  the  other;  this 
can  be  achieved,  inter  alia,  by  proper  teaching  and  training  in  the  elements  of  storage 
and  retrieval  of  information  for  wnich  usually  too  little  time  is  available  in  the  over¬ 
crowded  timetables  during  the  university  courses.  The  few  endeavours  described  above  are 
hopeful  omens  that  the  need  for  a  deeper  understanding  is  being  more  widely  realized. 
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The  use  of  scientific  literature  -  A  course  for  research  students 

TIMETABLE 


.1 

I 

7f 


Day 

Tine 

4 

.? 

1. 

2.00  p.m. 

Tour/descrlption/services/  of  the  Nationnl  Lending  Library 

for  Science  and  Technology  (in  particular  Reading  Room 
and  Staff  Library). 

^5s 

2. 

9. 15  a.m. 

Guides  to  published  information 

i.  Serials:  Current  awareness  tools 

Abstracting  journals 

'/ 

i 

1 

f 

Indexing  Journals 

1 

Annual  reviews 

Review  serials 

1' 

10.30  a.m. 

2.  Tools  for  student’s  specified  interest 

1 

1 

3. 

9. 15  a.m. 

3.  Reports:.  Indexes 

10.30  a.m. 

4.  Books:  Theses 

j 

•P.A 

Annual  Reports 

Yearbooks 

it 

■ 

Monographs 

Technical  dictionaries 

Language  dictionaries 

Encyclopaedias 

Bibliographies 

4. 

11. 15  a.m. 

Language  problems 

5. 

11.15  a.m. 

Record  keeping 

6. 

11. 15  a.m. 

Information  bureaux 

7. 

11. 15  a.m. 

Keeping  up  with  current  literature 

8. 

11. 15  a.m. 

Library  resources  in  the  U. K. 

.  f 

9. 

11.15  a.m. 

Films.  The  National  Lending  Library  for  Science  and 

1  1 

Technology.  National  Library  of  Medicine,  U.S.A. 

! 

> 

10. 

11. 15  a.m. 

Criticism  and  discussion  of  the  course. 

'  i 

Time  not  devoted  to 

lectures  will  be  spent  on  literature  searching. 

• 

(N.B. 

As  can  be  seen  from  this 

schedule  use  is  made  of  the  few  available  films  on  this  topic). 

1 

1 

t 

' 

Prom: 

J.Doc.,  Vol.22,  No.l,  March  1966,  pp.  22-32 

1 

L 

r 

• 
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DISCUSSION 


A. H. Holloway:  In  an  instructional  tour  for  new  entrants  to  the  British  Royal  Naval 
Scientific  Service,  it  has  been  found  that  over  half  of  these  graduates  have  never  had  any 
instruction  in  the  use  of  the  literature  and  that  after  about  six  months’  working  experience 
about  one  third  of  them  have  never  been  into  their  establishment  libraries.  At  the  request  of 
these  new  entrants  short  courses  are  being  arranged  to  demonstrate  the  information  facilities 
available  and  how  to  make  the  best  use  of  them. 


R.R. Dexter:  In  the  U.S.A. ,  the  Institute  of  Aerospace  Sciences  has  made  special  efforts 
to  make  the  engineer  in  industry  aware  of  information  and  documentation  facilities.  The 
approach  is  to  send  an  engineer  with  experience  in  information  work  and  skilled  at  putting 
his  ideas  across  into  a  firm,  where  he  investigates  the  documentation  activities  and 
suggests  improvements  to  and  better  ways  of  using  the  system. 

F. Liebesny:  This  is  a  very  interesting  approach.  In  any  activity  of  this  kind,  full 
support  from  management  is  essential. 


f 
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SUMMING  UP  OF  THE  TIP- SYMPOSIUM 
R.  Brec 


I  think  it  is  an  almost  impossible  task  to  summarize  the  impressions  gained  from  the 
presentation  of  sixteen  well  prepared  and  extremely  stimulating  papers  as  they  covered  a 
wide  range  of  subjects.  Besides,  in  some  cases  they  touched  topics  outside  my  own 
experience  and  I  am  therefore  not  entitled  to  give  any  judgements.  This  situation  ha;^ 
been  made  even  worse  as  circumstances  made  me  a  victim  of  a  faulty  information  transfer: 
when  asked  by  a  rather  blurred  long-distance  call  to  take  over  the  chair  for  the  last 
session  -  at  least  this  is  what  I  understood  -  I  agreed,  only  to  find  out  later  on  that  I 
had  quietly  committed  myself  to  trying  to  produce  this  summary.  In  other  words:  something 
looking  initially  like  a  privilege  amounted  in  fact  to  a  sentence  to  hard  labour! 

That  is  why  I  am  asking  to  be  forgiven  for  changing  what  was  called  “summing-up”  into 
the  presentation  of  a  rather  incomplete,  thoroughly  subjective  and  probably  biased  account 
of  my  personal  impressions. 

In  so  doing  I  am  wearing  three  different  hats. 

First,  being  an  engineer  and  at  the  same  time  director  of  the  nuclear  documentation  and 
information  centre  of  the  European  Communities  (Euratom/CID)  I  am  certainly  a  supplier  of 
information,  responsible  for  the  development  of  the  largest  European  venture  in  mechanization: 
the  Euratom  Nuclear  Pocumentatiori  System,  which  is  giving  access  to  a  store  of  over  750,000 
items  of  document  data  and  which  is  in  operation  for  retrospective  searches  and  for  ^>1. 

Secondly,  all  my  life  I  have  been  a  user  of  technical  information. 

Thirdly,  as  I  write  a  paper  every  once  in  a  while,  I  must  confess  to  being  also  an 
originator  of  such  information. 

Result:  the  conflicting  views  and  feelings  which  are  involved  in  •>.  aring  of  ? 
three  different  hats  are  struggling  at  this  moment  within  me. 

To  start  with,  the  title  of  the  symposium.  I  am  afraid  this  was  more  or  less  a 
misnomer.  I  simply  missed  any  true  dialogue  between  suppliers  and  users.  Dion’ t  we  in 
fact  hear  yet  another  extensive  multifaceted  monologue  of  those  who  claim  to  be  suppliers 
Ox  those  often  extremely  perishable  goods  which  it  is  customary  to  bundle  under  the  term 
“scientific  and  technical  information”’  I  really  wonder  how  the  members  of  the  co¬ 
sponsoring  Avionics  Panel  feel  about  this! 

On  the  other  hand,  it  is  not  surprising  that  there  was  no  real  dialogue  with  “the  user”. 
The  user  does  not  exist:  what  exists  is  a  rather  inarticulate  mass  of  users,  each  of  them 
full  of  individual  expectations,  insofar  as  they  expect  anything  at  all. 

The  first  paper  gave  me  reason  for  some  embarrassment:  it  retold  the  history  of  all  the 
brave  attempts  to  do  some  reasonable  spadework  to  improve  the  conditions  on  which  real 
progress  in  information  handling  does  depend.  These  attempts  were  at  that  time  mainly 
inspired  by  suggestions  from  AGARD/TIP  but  they  seem  to  have  been  shelved  and  forgotten. 

In  the  meantime,  their  revival  is  g  undertaken  by  bodies  like  OECD  and  UNF,SOO.  iliis 
is  not  «hat  is  embarrassing  me  but  the  little  response  received  from  governmental  manage¬ 
ment  and  administration  in  following  up  the  recommendations  we  presented  about  8  years  ago. 
One  would  assume  that  for  decision-making,  that  outstanding  governmental  priMlege,  everybody 
involved  would  crave  for  pertinent  Information  on  which  to  base  decisions.  It  -sight  well 
be,  however,  that  so-called  political  decisions  have  to  follow  a  differen*^  pattern. 


The  next  observation  -  and  this  might  be  rather  biased  -  is  how  differently  the 
scienti-Jt  or  the  engineer  tends  to  act.  depending  upon  which  position  he  assumes 
momentarily:  that  of  the  user,  or  that  of  the  originator  of  information.  Being  rather 
demanding  and  not  easily  satisfied  as  a  user,  he  seems  inclined,  as  an  originator,  to 
be  forgetful  of  all  the  rules  of  good  behaviour  and  of  all  the  necessities  for  making 
sure  that  bis  bit  of  Information  becomes  easily  retrievable  later  on  (by  giving  it 
precise  bibliographical  references,  exact  descriptive  data  for  cataloguing  and  a  decent 
abstract  of  informative  value). 

What  has  struck  me  generally  was  how  much  attention  has  been  devoted  during  this 
symposium  to  all  the  problems  of  the  hardware  involved  and  how  to  handle  it  aptly,  and 
how  little  attention  to  all  the  questions  of  quality  control  at  the  originating  level. 

It  goes  without  saying  that  in  this  respect  the  colleagues  handling  Information  related 
to  defence  are  hit  hardest:  disseminating  information  and  respecting  security  regulations 
are  definitely  incompatible  and  often  even  contradictory.  The  high  amount  of  public 
money  spent  in  this  field  in  combination  with  the  alleged  necessity  for  secrecy  seem  to 
favour  the  research  report  as  a  means  for  the  presentation  of  the  results  achieved.  If 
and  when  the  stamp  “secret”  is  added,  the  number  of  readers  is  anyway  reduced  and  so  is 
the  chance  that  considerations  concerning  the  quality  of  the  contents  come  into  the  picture. 
I  think  that  the  real  chance  and  hope  for  reducing  the  amount  of  garbage  lie  in  publish¬ 
ing  a  maximum  of  results  through  journals  of  high  standing,  the  publishers  of  which 
maintain  commendable  quality  standards  for  the  articles  they  accept  for  publication. 


««***«# 


Another  group  of  problems  concerns  the  intrinsic  value  of  Information  -  mechanization 
in  itself.  We  found  amazingly  optimistic  views  on  the  future  development  potential  of  the 
equipment,  which  sometimes  seemed  to  be  extrapolated  from  far  too  modest  statistical  data 
to  suggest  reliability  of  the  conclusions.  Prom  my  own  experience  I  know  that  many  a 
problem  shows  its  real  and  ovendielming  dimensions  only  under  real  life  conditions  and 
not  when  appraised  on  too  small  samples.  Parts  of  such  optimistic  statements  seemed  to 
be  slightly  influenced  by  the  desire  to  introduce  machinery  with  technically  interesting 
features  in  this  up-and-coming  field  of  information  handling,  without  worrying  too  much 
about  the  economics  involved.  Opposition  against  these  optimistic  views  on  the  usefulness 
of  sophisticated  machinery  for  information  handling  was  actually  voiced  in  cases  where 
manual  methods  proved  sufficiently  effective.  This  was  illustrated  by  the  very  impressive 
achievement  of  the  Netherlands  centre  which  stated  that  250  requests  per  day  constitute 
the  average  workload  of  its  manually  operated  system. 

But  I  think  that  even  this  very  admirable  feat  cannot  stop  the  trend  toward  the  intro¬ 
duction  of  mechanical  methods.  Hie  results  obtained  by  the  Eurato-  System  support  this 
statement,  as  does  the  firm  intention  of  some  editors,  like  those  of  Chemical  Abstracts,  to 
mechanize  their  operations  and  use  to  a  full  extent  all  the  gains  in  speed,  accuracy  and 
quantity  which  are  possible  with  an  extremely  careful  data-preparation.  It  is  remarkable, 
by  the  way,  that  for  the  moment  all  operational  mechanical  systems  depend  upon  the  skilful 
combination  of  scientific  and  technical  staff  for  handling  the  phases  of  literature  subject 
control,  vocabulary  control  and  systems  development.  Part  of  this  staff  is  indispensable 
for  the  translation  of  the  customers'  natural  language  questions  into  machine  language,  for 
screening  the  results  provided  by  the  machine,  etc.  I  am  convinced  that  this  will  stay  with 
us  for  quite  a  while,  even  though  higher  degrees  of  automation  in  the  introduction  of  data 
might  prove  feasible.  I  suppose  that  for  the  often  advertized  dialogue  between  user  and 
machine,  such  knowledgeable  interpreters  are  not  only  essential  but  their  employment  seems 
to  be  -  at  least  for  the  time  being  -  the  most  effective  and  the  most  economic  answer  to  the 
problem  of  ensuring  satisfactory  service  to  the  customer  without  having  to  train  large 
numbers  of  potential  users. 
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Introducing  machinery  in  information  handling  unavoidably  moaiis  introducing  centraliza¬ 
tion  of  operations  on  the  one  hand  and  partition  of  work  on  the  other.  This  is  new  for 
this  field  and  needs  a  rethinking  of  relations  and  working  style.  The  capacity  of 
computer-based  information  systems  to  handle  numerous  individual  requests  and  the  still 
high  cost  of  computer  time  would  lead  to  rather  high  operating  costs,  if  untrained  users 
were  allowed  direct  access  to  the  computer.  The  customer  needs  an  interpreter,  i.e.  a 
member  of  the  information  centre  with  the  needed  amount  of  knowledge  in  both  subject 
matter  and  retrieval  processes,  to  obtain  fast  and  inexpensive  answers. 

I  feel  that  it  is  unrealistic  to-daj-  to  believe  that  large  numbers  of  potential  users 
can  be  properly  trained  to  draw  full  benefit  from  mechanized  systems  by  posing  their 
questions  directly  to  the  machines.  One  cannot  hope  that  tb  rather  small  number  of 
specialists  who  have  developed  and  who  now  operate  existing  systems  could  possibly  provide 
user  training  on  a  large  scale;  failing  this  thorough  training,  the  user  is  left  to  his 
own  devices  such  as  h^hazard  methods  of  trial  and  error  irtiich  not  only  cost  machine  time 
but  may  also  shy  him  away  if  the  results  are  not  up  to  his  expectations;  it  is  Indeed  a 
common  human  habit  to  blame  others,  including  machines,  rather  than  blame  oneself. 

It  is  precisely  because  1  am  convinced  that  machines  can  lead  to  a  much  better  use  of 
existing  information,  whatever  its  volume,  that  I  feel  that  much  care  should  be  devoted  to 
make  users  familiar  with  these  new  methods,  thereby  ensuring  them  access  to  the  documents 
they  need. 

The  discussion  on  the  economics  of  information  handling  was  most  interesting  1  am 
afraid,  however,  that  the  useful  starting  point  for  assessing  the  economic  impact  of  the 
use  -  or  possibly,  of  the  non-use  -  of  information  has  still  to  be  looked  for.  To  my 
understanding,  creating  a  successful  system  of  storage  and  retrieval  of  information  within 
a  given  field  of  interest  is  only  tackling  one  half  of  the  problem,  even  though  this  system 
might  yield  excellent  results.  Creating  suen  a  system  means  only  creating  a  potential. 

The  real  value  of  the  potential  does  not  only  depend  on  the  supplier  of  tue  system  and  not 
even  on  its  quality;  it  depends  upon  what  the  user  is  drawing  from  the  system,  first  in 
form  of  access  to  available  information  and  secondly,  by  the  user’ s  own  and  irreplaceable 
act  of  evaluating  the  accessible  information  for  bis  own  problem  and  purpose. 

The  problem  of  bow  to  measure  at  all  such  an  impact  in  its  s'arlous  consequences, 
positive  or  negative  (i.e.  taking  advantage  of  given  information  for  acting  in  a  way  one 
would  not  have  acted  otherwise  or  for  avoiding  an  obvious  mistake)  seems  to  be  almost 
Impossible,  especially  if  one  wants  exact  and  convincing  results. 

Undeniably,  the  high  expenses  of  cresting  machine  systems  constitute  a  formidable 
obstacle  as  they  must  be  wrung  out  of  parliaments  which  are  slightly  scared  by  the 
complexity  of  the  problem?  involved.  I  feel  therefore  that  much  effort  must  be  concentra¬ 
ted  on  all  factors  which  influence  the  economics  of  the  development  and  even  more  the 
operation  of  such  systems.  The  potential  such  systems  are  offering  is  of  identical 
interest  to  any  industrial  country  in  the  world.  These  countries  are  also  the  largest 
originators  of  technical  and  scientific  Information.  By  cooperation  they  can  share  the 
burden  of  input  in  such  systems,  and  this  is  indeed  a  considerable  burden  because  so  much 
intellectual  work  is  involved.  Sharing  the  burden  means,  at  the  same  time,  individually 
enjoying  all  the  advantages  of  using  a  common  and  centrally  processed  input.  Another 
means  of  improving  the  overall  economics  of  such  systems  is  doubtlessly  the  formulation 
and  strict  observation  of  a  set  of  standards  for  presentation,  codes,  data-display,  etc. 
and,  not  less  important,  the  acceptance  of  minimum  quality  standards  for  abstracting. 

Still  -.wt.her  problem  would  be  solved  if  an  acceptable  way  could  be  found  Lo  ae.  t 
expenses  of  operating  systems  by  the  establishment  of  fair  charges  for  their  use.  I  came 
here  wondering  whether  or  not  this  ticklish  point  would  be  touched  by  one  of  the  pfM>ers 
read  or  during  the  discussions.  Alas  -  t  am  taking  leave  emptyhended  as  far  as  this 
problem  is  concerned  but  I  am  open  to  future  reasoning  on  this  topic. 
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A  higher  degree  of  input  automation  is  recommended  for  the  sake  of  economy,  too. 

Rather  complicated  measures  for  vocabulary  control  and  less  ejippns!’'e  machine  memories 
would  be  indispensable.  But  even  with  this,  a  highly  flexible  ''naracter  reading  machinery 
would  still  be  a  necessity. 

r 

The  mentioned  data  banks  seem  to  offer  much  promise  on  the  condit/on  that  access  to 
documents  containing  valuable  data  is  guaranteed  and  that,  furthermore,  a  very  considarable 
scientific  effort  is  performed:  the  data  as  contained  ii.  tha  documents  are  of  value,  in 
the  majority  of  cases,  only  when  they  are  related  to  c.;n:t.on  standard  bases  and  are  there¬ 
fore  comparable.  The  needed  standards  are  not  yet  ag--eed  upor  to  my  knowledge.  The 
extremely  high-life-expectancy  of  the  information  est.ulishe''  jy  such  transformed  and 
comparable  data  should  justify  the  considerable  effort  which  .jas  to  be  made  prior  to  their 
establishment  and  storage. 


The  old  topic  of  “Information”  versus  "documentation  access”  showed  up  as  usual.  It 
seems  to  me  that  information  really  bonomes  information  solely  through  the  evaluation  and 
perception  of  a  message  by  the  user  himself.  This  process  is  not  facilitated  but  hampered 
by  any  form  of  predigestion  whatsoever,  even  if  this  predigestion  is  done  for  any  group  of 
users  of  seemingly  identical  interest.  The  value  of  one  and  the  same  document  can  be  much 
at  variance  depending  on  the  problem  it  is  supposed  to  solve.  This  seems  to  lead  to  the 
conclusion  that  it  is  wiser  to  concentrate  efforts  on  providing  fast  and  dependable  access 
to  any  group  of  documents  corresponding  to  a  stated  subject-interest.  Wouldn’t  it  be 
wonderful  if  we  could  claim  that  this  first  and  indispensable  step  had  been  satisfactorily 
achieved’ 

In  several  places  the  use  of  free  language  has  been  advocated  during  this  symposium.  Is 
it  really  conceivable  that  this  can  be  brought  about  in  multilingual  systems’  I  am  afraid 
that  the  corresponding  needs  for  very  large  machine  memories  would  exceed  all  available 
resources,  at  least  financial  ones. 


Coming  now  to  the  summing  up  in  the  summing  up.  I  can  only  state  that  the  problem 
remains  as  complex  as  before;  however,  for  its  solution,  as  much  can  be  done  by  improving 
basic  methods  of  recording  the  information,  applying  useful  standards,  carrying  out 
intensive  training  in  Information  handling  and  creating  information  centres  staffed  with 
subject  expert  teams  as  by  pressing  on  in  the  direction  of  more  and  more  automation. 

I  am  afraid  that  the  striking  difference  between  the  tremendous  speed  of  machine- 
development  on  the  one  hand  and  the  much  more  modest  human  capacity  for  adaptation  to  the 
pew  methods  offered,  will  be  one  of  our  gravest  problems  when  introducing  machine  methods 
to  a  "clientele”  consisting  of  individuals.  Speeding  up  this  adaptation  must  therefore  be 
an  integral  part  of  our  effort. 

Certainly,  we  must  never  prevent  research  and  inventiveness  from  assisting  us  in  useful 
forms  of  information  handling.  But  we  must  be  very  careful  in  applying  the  new  methods  and 
very  thorough  in  testing  their  impact  on  the  customers.  The  financial  limitations  for 
system  building  will  stay  with  us,  at  least  here  in  Europe,  for  a  very  long  time;  to  increase 
our  resources,  the  indispensable  effort  of  convincing  politicians  and  administrators  of  the 
usefulness  of  using  information  mlg.ht  well  add  to  our  burden. 
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Looking  back  at  all  the  stimulating  papers  and  discussions,  one  thing  seems  clear  to  me 
regarding  the  future  rSle  of  scientific  information,  especially  in  our  highly  industrialized 
countries:  whether  information  has  a  measurable  economic  value  or  not,  it  is  tempting  to 
take  over  the  famous  definition  somebody'  gave  of  the  term  "tact”: 

“When  you  don’ t  have  it,  you  must  see  how  you  can  do  without!” 

But  I  am  fully  convinced  that  our  countries  cannot  do  without  full  access  to  the  world 
potential  of  existing  scientific  and  technical  information. 


I 
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VOTE  OF  THANKS  BY  H.F.VESSEY,  CHAIRMAN  T.I.P. 

Silr.  Chairman,  ladies  and  gentlemen,  it  is  my  privilege  and  pleasure  on  behalf  of 
AGARO  to  express  our  thanks  to  all  who  have  contributed  in  making  this  symposium  a 
success. 

Director  Finn  Lied  our  Chairman,  who  opened  the  sessions  regrets  that  he  cannot  be 
present  but  is  very  conscious  of  the  assistance  that  has  been  provided  by  the  German 
Ministry  of  Defence  and  in  particular  by  Dr.  Bcneke,  the  German  National  Delegate  to 
AGARD.  In  this  building  the  DGLR  has  done  a  wonderful  piece  of  organisation  in  providing 
magnificient  accommodation  and  facilities  and  in  particular  I  must  congratulate  Mr.  Steckel 

on  the  smooth  way  in  which  the  flow  of  nearly  200  persons  has  been  unimpeded.  When  we 

discusoed  the  meeting  arrangements  with  Mr.  Steckel,  frankly  I  did  not  believe  that  coffee 
breaks  could  be  held  to  20  minutes:  his  confidence  in  his  staff  has  been  fully  justified. 

Dr.  Rautenberg  too  has  helped  considerably  in  the  arrangements  and  I  wish  to  thank  him  and 
his  staff.  I  am  sure  that  you  will  agree  that  the  interpreters,  sound  technicians  and 
projectionists  have  done  extremely  well. 

Finally,  I  want  to  thank  the  authors,  session  Chairmen  and  last  but  not  least,  you,  the 
audience,  I  have  been  a  little  disappointed  at  the  reaction  from  the  “users”,  you  must  be 
more  satisfied  with  your  documentary  services  than  I  expected.  There  is  still  time,  however, 
if  you  wish  to  coaanent,  or  ask  fur''her  questions,  on  papers  you  have  heard.  Take  a  question 

paper,  fill  it  in  and  send  it  to  AGARD  in  Paris. 

I  am  sure  there  are  many  others  I  should  thank  for  their  assistance,  time  is  short, 
however,  and  I  must  apologize  for  omitting  them  and  say  that  the  gratitude  is  no  less 
genuine. 

Host  of  us  have  had  time  only  in  the  evenings  to  see  the  beauties  of  Munich  but  I 
certainly  have  been  impressed  by  the  Rathaus  and  by  the  friendliness  of  the  ordinary 
Munchener. 

I  am  myself  too  close  to  the  subject  and  too  much  involved  to  assess  the  results  of 
the  symposium  objectively  but  I  must  say  that  I  am  very  satisfied. 


Once  again,  our  thanks  to  all  our  German  friends  and  “Auf  wiedersehen”. 
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